source("custom_functions.R")Mucosal microbiota alterations in primary sclerosis cholangitis persist after liver transplantation and are associated with clinical features independently of geography
1. Hypothesis: PSC is associated with the gut microbiome changes
1 Introduction
1.1 About
Pre_LTx vs Post_LTx vs Healthy analysis on merged data
Alpha diversity – Richness, Shannon, Simpson, and Pielou indices tested by linear fixed effect or mixed effect model (group effect, country effect, interaction effect)
Beta diversity – PERMANOVA (group effect, country effect, interaction effect), PCA visualization
Differential abundance testing:
o Group effect – linDA + MaAsLin2 intersection
o Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries
ML models, trained and validated using out-of-sample bootstrap (500 resamplings):
o ENET
o kNN
o GBoost
o RF
Pairwise comparisons:
Pre_ltx vs Healthy – an effect of disease, liver cirrhosis and overall bad clinical condition
Pre_ltx vs post_ltx – an effect of transplantation + general improvement of the clinical state
Post_ltx vs Healthy – does LTx lead to „healthy microbiome”?
Report structure The analysis is divided into two segments: terminal ileum and colon. Each part begins with data filtering, where sequencing depth is assessed and nearZeroVar() is applied to remove taxa with no variance in the dataset. This is followed by a series of analytical steps:
Alpha diversity
Beta diversity
Univariate testing
Machine learning models training and validation
At the end of the analysis, a brief summary of the results is provided.
Importing libraries and custom functions built for this analysis
1.2 Clinical characteristics
clinical_char <- read.xlsx("../results/clinical_characteristics.xlsx",
colNames = FALSE)
colnames(clinical_char) <- clinical_char[3,]
clinical_char <- clinical_char[-c(1,2,3),]
header_groups <- c(" " = 2, "HC" = 2, "pre_LTx" = 2, "post_LTx_non-rPSC"=2, "post_LTx_rPSC"=2)knitr::kable(clinical_char, align="l", escape=FALSE,digits=3) %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive")) %>%
kable_styling(full_width = TRUE) %>%
add_header_above(header_groups)| NA | Norway | Czech | Norway | Czech | Norway | Czech | Norway | Czech | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | F/M [n] | 10/30 | 30/26 | 19/65 | 12/19 | 15/36 | 26/57 | 13 | 3/22 |
| 5 | Age [years] | NA | 50 (23;68) | 40 (17;77) | 35 (17;63) | 49 (23;70) | 48 (32;81) | NA | 48 (26;81) |
| 6 | IBD yes/no [n] | - | - | 66/18 | 25/6 | 31/7 | 72/11 | 11/2 | 24/1 |
| 7 | Total bilirubin [µmol/l] | - | 21 (3;68) | 24 | 62 (9;500) | 19.5 (11;49) | 18 (5;132) | 24 (13;52) | 24 (6;148) |
| 8 | AST [µkat/l] | - | 0.5 (0.3;1.9) | 1.1 (0.2;8.6) | 1.6 (0.5;4.0) | 0.6 (0.2;3.2) | 0.4 (0.2;1.7) | 1.1 (0.3, 4.0) | 0.5 (0.2;1.9) |
| 9 | ALT [µkat/l] | - | 0.5 (0.3;1.7) | 1.6 (0.5;8.6) | 1.7 (0.5;5.2) | 0.6 (0.1,4.1) | 0.5 (0.3;4.0) | 1.1 (0.2;4.5) | 0.5 (0.3;3.5) |
| 10 | ALP [µkat/l] | - | 1.3 (0.7;2.3) | 4.2 (0.9;17.3) | 6.0 (1.6;17.8) | 1.3 (0.6;7.9) | 1.4 (0.6;3.5) | 3.7 (0.5;13.9) | 2.3 (0.5;13.1) |
| 11 | GGT [µkat/l] | - | 0.4 (0.2;5.0) | - | 2.6 (0.6;15.2) | - | 0.4 (0.1; 4.8) | - | 1.4 (0.3;13.7) |
| 12 | INR [-] | - | 1.1 (1.1;1.2) | - | 1.1 (0.9;1.8) | - | 1.1 (0.9;1.6) | - | 1.1 (1.0;1.7) |
| 13 | Creatinine [µmol/l] | - | 77 (50;115) | 65 (39; 232) | 60 (44;101) | 84 (59; 159) | 87 (49;248) | 81 (65; 133) | 85 (50;780) |
| 14 | Albumin [g/l] | - | 50 (43;57) | 41 (26;49) | 38 (20;50) | 43 (35;49) | 45 (26;51) | 38 (34; 50) | 42 (26;50) |
| 15 | Fecal calprotectin [µg/g] | - | - | 59 (1;2844) | 123 (6;4513) | 30 (1;1945) | 275 (6;4821) | 55 (10;832) | 564 (14;4301) |
| 16 | NANCY_max [-] | - | - | - | 2 (0;4) | - | 2 (0;4) | - | 2 (0;4) |
| 17 | eMayo [-] | - | - | - | 1 (0;2) | - | 1 (0;3) | - | 1 (0;2) |
| 18 | Mayo_DAI [-] | - | - | - | 2 (0;5) | - | 1 (0;8) | - | 2 (0;7) |
| 19 | Mayo_PSC risk score [-] | - | - | 0.3 (;2.2;3.5) | 0.3 (;2.2;3.5) | 0.3 (;1.1;2.2) | - | 1.0 (;1.4;3.1) | - |
| 20 | AOM_score [-] | - | - | 1.9 (0.5;3.8) | 2 (1.1;5.0) | 1.9 (0.8;2.7) | - | 2.2 (1.2;3.5) | - |
| 21 | APRI_score [-] | - | 0.2 (0.1;1.5) | 0.7 (0.1;5.5) | 1.1 (0.3;14.3) | 0.5 (0.2;2.3) | 0.3 (0.1;1.5) | 0.7 (0.2;3.6) | 0.4 (0.1;1.4) |
| 22 | FIB-4_score [-] | - | 0.8 (0.4;3.4) | 1.1 (0.8;8.6) | 1.6 (0.4;23.1) | 1.5 (0.6;3.6) | 1.1 (0.3; 5.0) | 2.2 (0.5;5.8) | 1.1 (0.5;3.1) |
| 23 | MELD_score [-] | - | 7.8 (7.1;8.6) | - | 8.2 (6.4;18.0) | - | 8.0 (6.4;14.6) | - | 7.8 (6.5;22.8) |
| 24 | Platelets [10^9/l] | - | 267 (145;400) | 265 (37; 712) | 201 (20;477) | 197 (101;434) | 208 (41;442) | 212 (69;409) | 213 (83;416) |
1.3 Data Import
Importing ASV, taxa and metadata tables for both Czech and Norway samples.
Czech
path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
check.names = FALSE))Norway
path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
check.names = FALSE))1.4 Merging
Merging two countries based on the different matrices - Ileum, Colon.
Terminal ileum
ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="TI",Q="Q1")Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]Reads statistics
summary(colSums(ileum_asv_tab[,-1])) Min. 1st Qu. Median Mean 3rd Qu. Max.
3441 16579 20972 24071 28473 85910
Colon
colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="colon",Q="Q1")Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]Reads statistics
summary(colSums(colon_asv_tab[,-1])) Min. 1st Qu. Median Mean 3rd Qu. Max.
63 19814 26676 30841 37553 136048
2 Data Analysis - Terminal ileum
segment="terminal_ileum"2.1 Filtering
Rules:
sequencing depth > 10000
nearZeroVar() with default settings
Rarefaction Curve
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))load(file.path(path,"rarefaction_ileum.Rdata"))
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw() +
theme(axis.text=element_text(size=8), panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed",
color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(ileum_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.1 Sequencing depth
data_filt <- seq_depth_filtering(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
seq_depth_threshold = 10000)Removing 131 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata
seq_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata)
filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.3 Final Counts
final_counts_filtering(ileum_asv_tab,
filt_ileum_asv_tab,
filt_ileum_metadata,
seq_step, 0, nearzero_step) %>% `colnames<-`("Count") Count
Raw data: ASVs 5132
Raw data: Samples 294
Sequencing depth filt: ASVs 5001
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 479
Filt data: ASVs 479
Filt data: Samples 281
Filt data: Patients 281
Filt data: Patients.1 0
Filtered samples 13
Matrices TI
healthy 73
non-rPSC 0
rPSC 0
pre_ltx 70
post_ltx 138
ETOH 0
2.2 Alpha diversity
Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear model.
path = "../results/Q1/alpha_diversity"Calculation
# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
alpha_ileum_taxa_tab,
alpha_ileum_metadata))
ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_div_plots <- list()
# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
Observe=ileum_mpse$Observe,
Shannon=ileum_mpse$Shannon,
Simpson=ileum_mpse$Simpson,
Pielou=ileum_mpse$Pielou,
Group=ileum_mpse$Group,
Country=ileum_mpse$Country,
Patient=ileum_mpse$Patient)
# save the result
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)2.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha2.2.2 Linear Models
path = "../results/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
# run model
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_emeans <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_emeans <- NA
}
# save the results
pc_observed <- list();
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -53.642 | 11.742 | -4.568 | 0.000 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | 11.219 | 10.488 | 1.070 | 0.287 | 0.361 | |
| healthy vs Grouppre_ltx:CountryNO | 6.313 | 15.403 | 0.410 | 0.683 | 0.683 | |
| pre_ltx vs Grouppost_ltx | 25.623 | 11.366 | 2.254 | 0.025 | 0.057 | |
| pre_ltx , post_ltx - CZ vs NO | 17.533 | 12.615 | 1.390 | 0.166 | 0.249 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -35.933 | 15.921 | -2.257 | 0.025 | 0.057 | |
| healthy vs Grouppost_ltx | -28.019 | 9.237 | -3.033 | 0.003 | 0.012 | * |
| healthy , post_ltx - CZ vs NO | 11.219 | 11.267 | 0.996 | 0.321 | 0.361 | |
| healthy vs Grouppost_ltx:CountryNO | -29.620 | 14.629 | -2.025 | 0.044 | 0.080 |
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 25.623 | 11.366 | 204 | 2.254 | 0.025 |
| post_ltx - pre_ltx | NO | -10.310 | 11.148 | 204 | -0.925 | 0.356 |
| post_ltx - healthy | CZ | -28.019 | 9.237 | 207 | -3.033 | 0.003 |
| post_ltx - healthy | NO | -57.639 | 11.344 | 207 | -5.081 | 0.000 |
Shannon
# run model
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_emeans <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_emeans <- NA
}
# save the results
pc_shannon <- list();
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.505 | 0.159 | -3.171 | 0.002 | 0.017 | * |
| healthy , pre_ltx - CZ vs NO | 0.007 | 0.142 | 0.046 | 0.963 | 0.963 | |
| healthy vs Grouppre_ltx:CountryNO | 0.144 | 0.209 | 0.687 | 0.493 | 0.634 | |
| pre_ltx vs Grouppost_ltx | 0.372 | 0.148 | 2.514 | 0.013 | 0.049 | * |
| pre_ltx , post_ltx - CZ vs NO | 0.150 | 0.164 | 0.915 | 0.361 | 0.542 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.502 | 0.207 | -2.423 | 0.016 | 0.049 | * |
| healthy vs Grouppost_ltx | -0.134 | 0.111 | -1.200 | 0.231 | 0.416 | |
| healthy , post_ltx - CZ vs NO | 0.007 | 0.136 | 0.048 | 0.962 | 0.963 | |
| healthy vs Grouppost_ltx:CountryNO | -0.358 | 0.176 | -2.031 | 0.044 | 0.098 |
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.372 | 0.148 | 204 | 2.514 | 0.013 |
| post_ltx - pre_ltx | NO | -0.130 | 0.145 | 204 | -0.898 | 0.370 |
| post_ltx - healthy | CZ | -0.134 | 0.111 | 207 | -1.200 | 0.231 |
| post_ltx - healthy | NO | -0.492 | 0.137 | 207 | -3.597 | 0.000 |
Simpson
# run model
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_emeans <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_emeans <- NA
}
# save the results
pc_simpson <- list();
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.046 | 0.019 | -2.481 | 0.014 | 0.129 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.017 | -0.466 | 0.642 | 0.692 | |
| healthy vs Grouppre_ltx:CountryNO | 0.027 | 0.024 | 1.092 | 0.277 | 0.498 | |
| pre_ltx vs Grouppost_ltx | 0.037 | 0.021 | 1.763 | 0.079 | 0.238 | |
| pre_ltx , post_ltx - CZ vs NO | 0.019 | 0.023 | 0.811 | 0.418 | 0.627 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.060 | 0.029 | -2.033 | 0.043 | 0.195 | |
| healthy vs Grouppost_ltx | -0.009 | 0.016 | -0.572 | 0.568 | 0.692 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.396 | 0.692 | 0.692 | |
| healthy vs Grouppost_ltx:CountryNO | -0.033 | 0.025 | -1.303 | 0.194 | 0.436 |
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.037 | 0.021 | 204 | 1.763 | 0.079 |
| post_ltx - pre_ltx | NO | -0.023 | 0.021 | 204 | -1.106 | 0.270 |
Pielou
# run model
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_emeans <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_emeans <- NA
}
# save the results
pc_pielou <- list();
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.040 | 0.024 | -1.708 | 0.090 | 0.285 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.021 | -0.394 | 0.694 | 0.943 | |
| healthy vs Grouppre_ltx:CountryNO | 0.010 | 0.031 | 0.332 | 0.740 | 0.943 | |
| pre_ltx vs Grouppost_ltx | 0.042 | 0.022 | 1.922 | 0.056 | 0.285 | |
| pre_ltx , post_ltx - CZ vs NO | 0.002 | 0.024 | 0.082 | 0.935 | 0.943 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.051 | 0.030 | -1.677 | 0.095 | 0.285 | |
| healthy vs Grouppost_ltx | 0.001 | 0.017 | 0.072 | 0.943 | 0.943 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.407 | 0.684 | 0.943 | |
| healthy vs Grouppost_ltx:CountryNO | -0.041 | 0.027 | -1.525 | 0.129 | 0.290 |
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.042 | 0.022 | 204 | 1.922 | 0.056 |
| post_ltx - pre_ltx | NO | -0.009 | 0.021 | 204 | -0.435 | 0.664 |
2.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))2.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.
2.3.1 Main analysis
Genus level, Aitchison distance
level="genus"
path = "../results/Q1/beta_diversity"pairwise_aitchison_raw <- list()
pca_plots_list <- list()Aggregation, filtering
# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level=level,
names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
ileum_metadata,
seq_depth_threshold=10000)Removing 5 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]2.3.1.0.1 PERMANOVA
# prepare dataset
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "robust.aitchison",
p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 695.306 | 4.259 | 0.029 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 309.768 | 1.811 | 0.009 | 0.003 | 0.003 | ** |
| post_ltx vs healthy | 1 | 914.409 | 5.454 | 0.025 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 482.074 | 2.953 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 699.075 | 4.087 | 0.019 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 728.020 | 4.342 | 0.020 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 237.598 | 1.460 | 0.010 | 0.010 | 0.030 | * |
| pre_ltx vs post_ltx : Country | 1 | 237.265 | 1.390 | 0.007 | 0.021 | 0.032 | * |
| post_ltx vs healthy : Country | 1 | 202.888 | 1.211 | 0.006 | 0.106 | 0.106 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2)
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 275.1 0.02844 1.7272 0.003 0.003
Residual 59 9396.0 0.97156
Total 60 9671.1 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 657.8 0.0474 3.9806 0.001 0.0013333
Residual 80 13220.8 0.9526
Total 81 13878.7 1.0000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 362.6 0.03086 2.165 0.001 0.0013333
Residual 68 11387.2 0.96914
Total 69 11749.8 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 357.1 0.03082 2.2579 0.001 0.0013333
Residual 71 11229.6 0.96918
Total 72 11586.7 1.00000
$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 184.3 0.0085 1.0629 0.314 0.314
Residual 124 21497.3 0.9915
Total 125 21681.6 1.0000
$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 362.8 0.02649 2.1767 0.001 0.0013333
Residual 80 13332.7 0.97351
Total 81 13695.4 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 362.6 0.03086 2.165 0.001 0.0013333
Residual 68 11387.2 0.96914
Total 69 11749.8 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 573.8 0.02389 3.3288 0.001 0.0013333
Residual 136 23442.8 0.97611
Total 137 24016.6 1.00000
2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p2.3.1.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))2.3.2 Supplementary analysis
supplements_beta <- list()2.3.2.1 Genus level
level="genus"2.3.2.1.1 Bray-Curtis
PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "bray",
p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 0.963 | 5.058 | 0.033 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 0.342 | 1.581 | 0.007 | 0.070 | 0.070 | |
| post_ltx vs healthy | 1 | 1.551 | 7.718 | 0.034 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 1.132 | 5.948 | 0.039 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1.991 | 9.200 | 0.042 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1.656 | 8.242 | 0.037 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.449 | 2.380 | 0.016 | 0.002 | 0.006 | ** |
| pre_ltx vs post_ltx : Country | 1 | 0.330 | 1.527 | 0.007 | 0.075 | 0.075 | |
| post_ltx vs healthy : Country | 1 | 0.376 | 1.878 | 0.008 | 0.013 | 0.020 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.4144 0.0363 2.2222 0.006 0.006
Residual 59 11.0035 0.9637
Total 60 11.4180 1.0000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.9972 0.06156 5.2483 0.001 0.0013333
Residual 80 15.2005 0.93844
Total 81 16.1977 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.9344 0.06093 4.4123 0.001 0.0013333
Residual 68 14.4010 0.93907
Total 69 15.3354 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.6466 0.05193 3.8893 0.001 0.0013333
Residual 71 11.8030 0.94807
Total 72 12.4496 1.00000
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.7698 0.02676 3.7676 0.001 0.001
Residual 137 27.9925 0.97324
Total 138 28.7623 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.1573 0.0793 6.0291 0.001 0.001
Residual 70 13.4368 0.9207
Total 71 14.5941 1.0000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.3859 0.04469 6.3619 0.001 0.001
Residual 136 29.6263 0.95531
Total 137 31.0121 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.6466 0.05193 3.8893 0.001 0.001
Residual 71 11.8030 0.94807
Total 72 12.4496 1.00000
Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p2.3.2.1.2 Jaccard
PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "jaccard",
p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1.087 | 3.929 | 0.026 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 0.439 | 1.457 | 0.007 | 0.045 | 0.045 | * |
| post_ltx vs healthy | 1 | 1.633 | 5.675 | 0.026 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 1.232 | 4.452 | 0.030 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1.965 | 6.513 | 0.030 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1.751 | 6.085 | 0.028 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.525 | 1.912 | 0.013 | 0.002 | 0.006 | ** |
| pre_ltx vs post_ltx : Country | 1 | 0.405 | 1.346 | 0.006 | 0.072 | 0.072 | |
| post_ltx vs healthy : Country | 1 | 0.450 | 1.569 | 0.007 | 0.017 | 0.026 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.5009 0.03035 1.8467 0.007 0.007
Residual 59 16.0018 0.96965
Total 60 16.5027 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.1117 0.04768 4.0054 0.001 0.0013333
Residual 80 22.2037 0.95232
Total 81 23.3154 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.9632 0.04568 3.2547 0.001 0.0013333
Residual 68 20.1235 0.95432
Total 69 21.0867 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.794 0.04206 3.1176 0.001 0.0013333
Residual 71 18.082 0.95794
Total 72 18.876 1.00000
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.865 0.02135 2.9894 0.001 0.001
Residual 137 39.627 0.97865
Total 138 40.491 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.2181 0.05806 4.3147 0.001 0.001
Residual 70 19.7616 0.94194
Total 71 20.9797 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.407 0.03294 4.6318 0.001 0.001
Residual 136 41.306 0.96706
Total 137 42.713 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.794 0.04206 3.1176 0.001 0.001
Residual 71 18.082 0.95794
Total 72 18.876 1.00000
Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p2.3.2.2 ASV level
level="ASV"2.3.2.2.1 Aitchison
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "robust.aitchison",
p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1010.000 | 3.749 | 0.026 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 364.559 | 1.485 | 0.007 | 0.003 | 0.003 | ** |
| post_ltx vs healthy | 1 | 1389.311 | 5.466 | 0.025 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 603.397 | 2.239 | 0.015 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 671.339 | 2.734 | 0.013 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 789.105 | 3.105 | 0.014 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 364.013 | 1.354 | 0.009 | 0.014 | 0.021 | * |
| pre_ltx vs post_ltx : Country | 1 | 352.491 | 1.439 | 0.007 | 0.007 | 0.021 | * |
| post_ltx vs healthy : Country | 1 | 281.750 | 1.109 | 0.005 | 0.170 | 0.170 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2)
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 420 0.02629 1.5929 0.004 0.004
Residual 59 15557 0.97371
Total 60 15977 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 954 0.04193 3.5009 0.001 0.002
Residual 80 21800 0.95807
Total 81 22754 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 460.2 0.02579 1.8004 0.002 0.0026667
Residual 68 17380.9 0.97421
Total 69 17841.1 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 507.2 0.02476 1.8028 0.001 0.002
Residual 71 19976.2 0.97524
Total 72 20483.4 1.00000
$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 266.4 0.00872 1.0904 0.221 0.221
Residual 124 30296.1 0.99128
Total 125 30562.5 1.00000
$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 450.6 0.02237 1.8308 0.001 0.002
Residual 80 19691.5 0.97763
Total 81 20142.1 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 460.2 0.02579 1.8004 0.002 0.0026667
Residual 68 17380.9 0.97421
Total 69 17841.1 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 564 0.01699 2.3509 0.001 0.002
Residual 136 32607 0.98301
Total 137 33170 1.00000
PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p
# see the results
p2.3.2.2.2 Bray-Curtis
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1.628 | 5.225 | 0.035 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 0.519 | 1.563 | 0.007 | 0.024 | 0.024 | * |
| post_ltx vs healthy | 1 | 2.428 | 7.667 | 0.035 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 1.136 | 3.644 | 0.024 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1.676 | 5.049 | 0.024 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1.667 | 5.264 | 0.024 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.469 | 1.511 | 0.010 | 0.027 | 0.040 | * |
| pre_ltx vs post_ltx : Country | 1 | 0.562 | 1.698 | 0.008 | 0.010 | 0.030 | * |
| post_ltx vs healthy : Country | 1 | 0.432 | 1.367 | 0.006 | 0.064 | 0.064 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.5756 0.02993 1.8204 0.005 0.005
Residual 59 18.6557 0.97007
Total 60 19.2313 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.5215 0.05848 4.969 0.001 0.0013333
Residual 80 24.4953 0.94152
Total 81 26.0168 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.8717 0.03713 2.6224 0.001 0.0013333
Residual 68 22.6028 0.96287
Total 69 23.4744 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.733 0.03444 2.5326 0.001 0.0013333
Residual 71 20.548 0.96556
Total 72 21.281 1.00000
$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.430 0.01027 1.2865 0.114 0.114
Residual 124 41.443 0.98973
Total 125 41.873 1.00000
$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.6509 0.02437 1.9985 0.004 0.0053333
Residual 80 26.0569 0.97563
Total 81 26.7079 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.8717 0.03713 2.6224 0.001 0.002
Residual 68 22.6028 0.96287
Total 69 23.4744 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.366 0.02953 4.1389 0.001 0.002
Residual 136 44.897 0.97047
Total 137 46.263 1.00000
PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p2.3.2.2.3 Jaccard
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "jaccard",
p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1.354 | 3.531 | 0.024 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 0.528 | 1.327 | 0.006 | 0.024 | 0.024 | * |
| post_ltx vs healthy | 1 | 1.921 | 4.964 | 0.023 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 0.994 | 2.593 | 0.018 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1.384 | 3.480 | 0.017 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1.396 | 3.607 | 0.017 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.514 | 1.344 | 0.009 | 0.020 | 0.030 | * |
| pre_ltx vs post_ltx : Country | 1 | 0.571 | 1.439 | 0.007 | 0.010 | 0.030 | * |
| post_ltx vs healthy : Country | 1 | 0.488 | 1.262 | 0.006 | 0.041 | 0.041 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.5659 0.02422 1.4643 0.006 0.006
Residual 59 22.8020 0.97578
Total 60 23.3680 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.302 0.04112 3.4309 0.001 0.0013333
Residual 80 30.352 0.95888
Total 81 31.654 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.7898 0.02836 1.985 0.001 0.0013333
Residual 68 27.0550 0.97164
Total 69 27.8447 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.7183 0.02679 1.9541 0.001 0.0013333
Residual 71 26.0995 0.97321
Total 72 26.8178 1.00000
$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.468 0.00938 1.1736 0.115 0.115
Residual 124 49.489 0.99062
Total 125 49.957 1.00000
$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.63 0.01964 1.6025 0.003 0.004
Residual 80 31.47 0.98036
Total 81 32.10 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.7898 0.02836 1.985 0.001 0.002
Residual 68 27.0550 0.97164
Total 69 27.8447 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.165 0.02116 2.9397 0.001 0.002
Residual 136 53.904 0.97884
Total 137 55.069 1.00000
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.058 0.01938 2.7075 0.001 0.001
Residual 137 53.521 0.98062
Total 138 54.578 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.3507 0.04853 3.5701 0.001 0.001
Residual 70 26.4826 0.95147
Total 71 27.8333 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.165 0.02116 2.9397 0.001 0.001
Residual 136 53.904 0.97884
Total 137 55.069 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.7183 0.02679 1.9541 0.001 0.001
Residual 71 26.0995 0.97321
Total 72 26.8178 1.00000
PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p2.3.2.3 Saving results
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
file = file.path(path,
paste0("supplements_beta_diversity_", segment,".xlsx")))2.4 Univariate Analysis
Using two DAA tools - linDA and Maaslin. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.
Main analysis is performed at genus level. In supplements, analysis is performed at ASV and phylum levels.
2.4.1 Main analysis
level="genus"
# needed paths
path = "../results/Q1/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q1",level)# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list();
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()
# workbook for final df
wb <- createWorkbook()
# PSC effect
psc_effect <- list()Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
ileum_genus_taxa_tab)2.4.1.1 pre_ltx vs healthy
We can assume, that the differentially abundant taxa between these two groups are caused by PSC disease and other factors, like liver cirrhosis and overall bad clinical condition.
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.1.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 84 ASV(s)
Removing 10 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 143 samples and 184 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plots
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.1.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 46 ASV(s)
Removing 6 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 208 samples and 172 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.2.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.3 post_ltx vs healthy
We can assume, that the differentially abundant taxa between these two groups remain because of the persistent gut MB alteration due to the PSC disease. Also, transplantation itself could add to this difference, as it has definitely strong impact on microbial composition.
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 45 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 211 samples and 180 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.3.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
ileum_taxa_tab) + xlab("") + ylab("")min_clr -1.767585
max_clr 6.908912
min_log -5.062147
max_log 4.892498
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),
p_prevalence_final,
ncol=2,widths = c(1,0.3))Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_text()`).
pdot_heatmap_ileum <- p2.4.1.5 PSC effect
To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).
pre_LTx vs HC and Post_LTx vs HC intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
10 Enterorhabdus
16 Barnesiella
19 Butyricimonas
20 Odoribacter
27 Alistipes
29 Parabacteroides
41 Holdemania
64 Coprococcus
68 Fusicatenibacter
72 Lachnoclostridium
74 Lachnospiraceae_FCS020_group
105 Oscillibacter
118 Faecalibacterium
155 Pseudomonas
164 Enterococcus
Taxonomy
10 k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Enterorhabdus
16 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
19 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
20 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
29 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
41 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
64 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
68 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
72 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
74 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
105 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
118 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium
155 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
164 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
log2FoldChange p_value padj
10 -1.050549 1.087023e-02 4.545734e-02
16 -3.177161 1.263665e-03 7.750481e-03
19 -2.900425 6.571752e-05 9.301556e-04
20 -2.971657 3.696980e-05 6.184040e-04
27 -2.559742 5.731016e-04 4.336862e-03
29 -3.418185 1.366753e-05 2.794251e-04
41 -1.213660 1.082333e-02 4.545734e-02
64 -3.868534 6.688508e-07 3.076714e-05
68 -2.864295 5.224510e-04 4.336862e-03
72 -1.781011 1.555333e-03 9.231654e-03
74 -2.231945 8.971276e-04 6.113758e-03
105 -1.970465 2.772265e-03 1.416936e-02
118 -3.360183 8.947299e-06 2.351861e-04
155 1.573079 8.758249e-03 4.028794e-02
164 4.892498 4.554602e-10 8.380467e-08
2.4.1.6 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("significant_taxa_",segment,".xlsx")))2.4.2 Supplementary Analysis
supplements_uni <- list()
supplements_wb <- createWorkbook()2.4.2.1 ASV level
level="ASV"path_maaslin="../intermediate_files/maaslin/Q1/ASV/"raw_linda_results <- list();
raw_linda_results[[segment]] <- list()
linda_results <- list();
linda_results[[segment]] <- list()2.4.2.1.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 1598 ASV(s)
Removing 146 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 143 samples and 453 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 979 ASV(s)
Removing 68 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 208 samples and 349 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 641 ASV(s)
Removing 104 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 211 samples and 444 features will be tested!
Pseudo-count approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
ileum_taxa_tab)min_clr -1.065303
max_clr 4.679688
min_log -5.165415
max_log 4.224397
dotheatmap_linda2.4.2.1.5 PSC effect
pre_LTx vs Healthy and Post_LTx vs Healthy intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
22 TAGGGAATTTTCGGCAATGGGGGAAACCCTGACCGAGCAACGCCGCGTGAAGGAAGAAGTAATTCGTTATGTAAACTTCTGTCATAGAGGAAGAACGGTGGATATAGGGAATGATATCCAAGTGACGGTACTCTATAAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCACTAAGGGTCTGTGGTGAAAGATCGAAGCTTAACTTCGGTAAGCCATGGAAACCGTAGAGCTAGAGTGTGTGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCGCATAACTGACGCTC
39 TGAGGAATATTGGGCAATGGAGGAAACTCTGACCCAGCCATGCCGCGTGAGTGAAGAAGGTTTTCGAATTGTAAAGCTCTTTCGGATGGGACGATGATGACGGTACCATCTAAAGAAGCCCCGGCAAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGTGTGTAGGCGGATTTGTAAGATAGTGGTGAAATACCTGAGCTCAACTTAGGAATTGCCATTATAACTATAGATCTGGAGTGACAGAGAGGATATTGGAATACCCAGTGTAGAGGTGAAATTCGTAGATATTGGGTAGAACACCAGTGGCGAAGGCGAGTATCTGGCTGTCAACTGACGCTGAGGCACGAAAGCATGGGGATCAAA
70 TGAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGCAGGATGACGGCTCTATGAGTTGTAAACTGCTTTTGTACGAGGGTAAACGCAGATACGTGTATCTGTCTGAAAGTATCGTACGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATTTCGGGGCTCAACCCTGAACGTGCCTCTAATACTGTTGAGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGC
83 TGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGAAGGCTCTATGGGTCGTAAACTTCTTTTATATGGGAATAAAGTTTTCCACGTGTGGAATTTTGTATGTACCATATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTAGACTGTCACTGACACTGATGC
133 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCACTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGAGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
134 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGCAAGTCTGAAGTGAAAGGCATGGGCTCAACCTGTGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
139 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCGATGCAAGTCTGGAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTATGGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
140 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAAGCAAGTCTGAAGTGAAAACCCAGGGCTCAACCCTGGGACTGCTTTGGAAACTGTTTTGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
141 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTAAAGCAAGTCTGAAGTGAAAGCCCGCGGCTCAACTGCGGGACTGCTTTGGAAACTGTTTAACTGGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGACTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
166 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGCGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGAATGGCTTTGGAAACTGTGCAGCTAGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
169 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGACAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGCCTTGCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
196 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCATTGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGTAGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
198 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTATGGCAAGTCTGATGTGAAAGGCCAGGGCTCAACCCTGGGACTGCATTGGAAACTGTCGAACTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
207 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCACCGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGTAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTATGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCCAGTGTAGCGGTGAAATGCGTAGATATTGGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
212 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGGCAAGTCTGATGTGAAAACCCGGGGCTCAACCCCGTGACTGCATTGGAAACTGTTTTGCTTGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGCAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
220 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAAGGCAAGTCTGATGTGAAAACCCAGGGCTTAACCCTGGGACTGCATTGGAAACTGTCTGGCTCGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
240 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTCAAGCAAGTCAGAAGTGAAAGGCTGGGGCTCAACCCCGGGACTGCTTTTGAAACTGTTTGACTGGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
270 TGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGATTGAAGAAGGCCTTCGGGTTGTAAAGATCTTTAATCAGGGACGAAACAAATGACGGTACCTGAAGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGCGCAGGCGGGCCGGTAAGTTGGAAGTGAAATCTATGGGCTTAACCCATAAACTGCTTTCAAAACTGCTGGTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
283 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAAGAGTAGAAGACGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCCGGCAAGTCAGATGTGAAATCTGGAGGCTTAACCTCCAAACTGCATTTGAAACTGTAGGTCTTGAGTACCGGAGAGGTTATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATAACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
301 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTATGAGGGACGAAGGAAGTGACGGTACCTCATGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCGAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGGAAGCAAGTCAGATGTGAAAACCAGTGGCTCAACCACTGGCCTGCATTTGAAACTGTTTTTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
305 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTCTCAGGGACGAACAAATGACGGTACCTGAGGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGAAGGCAAGTCAGATGTGAAAACTATGGGCTCAACCCATAGCCTGCATTTGAAACTGTTTTTCTTGAGTGCTGGAGAGGCAATCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGATTGCTGGACAGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
338 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTGAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATAGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
374 TGGGGGATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGACGCAAGTCTGAAGTGAAATACCCGGGCTCAACCTGGGAACTGCTTTGGAAACTGTGTTGCTAGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
401 TAGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAACAAGGATGAGAGTAACTGTTCATCCCTTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCT
426 TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGCGGGATGGAGGCCTTCGGGTTGTAAACCGCTTTTGATCGGGAGCAAGCCCTTCGGGGTGAGTGTACCTTTCGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGCCCATCGCTTAACGGTGGGTCTGCGCCGGGTACGGGCGGGCTGGAGTGCGGTAGGGGAGACTGGAATTCCCGGTGTAACGGTGGAATGTGTAGATATCGGGAAGAACACCAATGGCGAAGGCAGGTCTCTGGGCCGTCACTGACGCTGAGGAGCGAAAGCGTG
Taxonomy
22 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelatoclostridiaceae;g__Erysipelotrichaceae_UCG-003;s__unassigned
39 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodospirillales;f__uncultured;g__uncultured;s__unassigned
70 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__unassigned
83 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__unassigned
133 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
134 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
139 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
140 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
141 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
166 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
169 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
196 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
198 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
207 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__unassigned
212 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__CAG-56;s__unassigned
220 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter;s__unassigned
240 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
270 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus;s__unassigned
283 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
301 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__uncultured;s__unassigned
305 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas;s__unassigned
338 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
374 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_UCG-004;s__unassigned
401 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__unassigned
426 k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Bifidobacteriales;f__Bifidobacteriaceae;g__Bifidobacterium;s__unassigned
log2FoldChange p_value padj
22 -4.4298696 1.676032e-06 1.265404e-04
39 -2.0538259 2.203655e-03 2.697988e-02
70 -3.5115154 1.659708e-04 4.699048e-03
83 -1.9687275 1.190853e-04 4.149666e-03
133 -5.1654153 4.258197e-07 3.857926e-05
134 -4.4957248 1.521108e-08 3.757406e-06
139 -2.9269443 4.470062e-04 8.099752e-03
140 3.8241704 8.725084e-06 4.391626e-04
141 3.2182184 1.288509e-04 4.169248e-03
166 -2.5678980 8.934631e-04 1.349129e-02
169 -5.0632949 2.091495e-07 2.368618e-05
196 -1.7469483 3.668450e-03 3.661578e-02
198 -2.4204302 2.028649e-04 5.105433e-03
207 -5.0455471 1.658899e-08 3.757406e-06
212 -3.0907151 1.418501e-03 1.835946e-02
220 -2.8612176 8.567099e-04 1.349129e-02
240 -2.1804819 8.890519e-04 1.349129e-02
270 -2.1661995 3.862513e-03 3.722805e-02
283 -2.8158488 2.758436e-05 1.135974e-03
301 -2.1038157 4.028492e-03 3.801889e-02
305 -1.9220557 1.012676e-03 1.433569e-02
338 2.3054457 3.730015e-04 7.756897e-03
374 -2.9436340 6.863898e-05 2.591121e-03
401 3.9907909 3.116171e-08 4.705418e-06
426 0.8645068 5.884295e-03 4.759975e-02
2.4.2.2 Phylum level
level="phylum"path_maaslin="../intermediate_files/maaslin/Q1/Phylum/"raw_linda_results_phylum <- list();
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum <- list();
linda_results_phylum[[segment]] <- list()Aggregate taxa
phylum_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = "Phylum")
ileum_phylum_tab <- phylum_data[[1]]
ileum_phylum_taxa_tab <- phylum_data[[2]]2.4.2.2.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
ileum_phylum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 143 samples and 10 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_phylum[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_phylum[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ") Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_phylum,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_phylum[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
ileum_phylum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 208 samples and 10 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_phylum[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_phylum[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ") Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_phylum,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_phylum[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.2.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
ileum_phylum_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 1 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 211 samples and 10 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_phylum[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_phylum[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ") Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_phylum,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_phylum[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.2.2.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
ileum_taxa_tab)min_clr -4.119052
max_clr 3.1473
min_log -2.944293
max_log 2.82227
dotheatmap_linda2.4.2.2.5 PSC effect
pre_LTx vs Healthy and Post_LTx vs Healthy intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] [1] SeqID Taxonomy log2FoldChange p_value padj
<0 rows> (or 0-length row.names)
2.4.2.3 Saving results
# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect,
file.path(path,paste0("supplements_psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))2.5 Machine learning
Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.
The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.
path = "../results/Q1/models"2.5.1 ENET
model="enet"2.5.1.1 ASV level
level="ASV"2.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.4000000
lambda 0.0256910
auc 1.0000000
auc_czech 1.0000000
auc_no 1.0000000
auc_optimism_corrected 0.9562938
auc_optimism_corrected_CIL 0.8940903
auc_optimism_corrected_CIU 0.9955143
accuracy 1.0000000
accuracy_czech NaN
accuracy_no 1.0000000
accuracy_optimism_corrected 0.8824810
accuracy_optimism_corrected_CIL 0.7808636
accuracy_optimism_corrected_CIU 0.9600000
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 70
$czech
Predicted
True 0 1
0 37 0
1 0 24
$no
Predicted
True 0 1
0 36 0
1 0 46
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.1292328
auc 0.9967909
auc_czech 0.9979575
auc_no 0.9969807
auc_optimism_corrected 0.8645187
auc_optimism_corrected_CIL 0.7852767
auc_optimism_corrected_CIU 0.9318161
accuracy 0.9278846
accuracy_czech NaN
accuracy_no 0.9512195
accuracy_optimism_corrected 0.8220597
accuracy_optimism_corrected_CIL 0.7333333
accuracy_optimism_corrected_CIU 0.8924940
enet_model$conf_matrices$original
Predicted
True 0 1
0 138 0
1 15 55
$czech
Predicted
True 0 1
0 102 0
1 11 13
$no
Predicted
True 0 1
0 36 0
1 4 42
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.20000000
lambda 0.04799701
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.95799961
auc_optimism_corrected_CIL 0.89999175
auc_optimism_corrected_CIU 0.98977781
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.89383497
accuracy_optimism_corrected_CIL 0.82118019
accuracy_optimism_corrected_CIU 0.96028870
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 138
$czech
Predicted
True 0 1
0 37 0
1 0 102
$no
Predicted
True 0 1
0 36 0
1 0 36
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]2.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.20000000
lambda 0.03984448
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96582239
auc_optimism_corrected_CIL 0.91575521
auc_optimism_corrected_CIU 0.99568966
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.89754798
accuracy_optimism_corrected_CIL 0.80769231
accuracy_optimism_corrected_CIU 0.97141139
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 70
$czech
Predicted
True 0 1
0 37 0
1 0 24
$no
Predicted
True 0 1
0 36 0
1 0 46
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group", N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.0000000
lambda 0.6674012
auc 0.9726708
auc_czech 0.9616013
auc_no 0.9800725
auc_optimism_corrected 0.8344709
auc_optimism_corrected_CIL 0.7487803
auc_optimism_corrected_CIU 0.9154473
accuracy 0.8653846
accuracy_czech NaN
accuracy_no 0.8780488
accuracy_optimism_corrected 0.7788698
accuracy_optimism_corrected_CIL 0.6973684
accuracy_optimism_corrected_CIU 0.8516141
enet_model$conf_matrices$original
Predicted
True 0 1
0 136 2
1 26 44
$czech
Predicted
True 0 1
0 101 1
1 17 7
$no
Predicted
True 0 1
0 35 1
1 9 37
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,group,
usage="ml_clr")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.07097722
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96566611
auc_optimism_corrected_CIL 0.92200341
auc_optimism_corrected_CIU 0.99386451
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.90031834
accuracy_optimism_corrected_CIL 0.82800926
accuracy_optimism_corrected_CIU 0.96054545
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 138
$czech
Predicted
True 0 1
0 37 0
1 0 102
$no
Predicted
True 0 1
0 36 0
1 0 36
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.3 Saving results
models_summ_df_ileum <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))2.5.2 Supplementary models
supplements_models <- list()2.5.2.1 CLR-transformed data
2.5.2.1.1 kNN
model="knn"2.5.2.1.1.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 18.0000000
auc 0.9239726
auc_optimism_corrected 0.8707255
auc_optimism_corrected_CIL 0.7611443
auc_optimism_corrected_CIU 0.9538139
accuracy 0.7762238
accuracy_optimism_corrected 0.7606251
accuracy_optimism_corrected_CIL 0.6400000
accuracy_optimism_corrected_CIU 0.8714046
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 16.0000000
auc 0.8342650
auc_optimism_corrected 0.6861098
auc_optimism_corrected_CIL 0.5546110
auc_optimism_corrected_CIU 0.7936124
accuracy 0.7067308
accuracy_optimism_corrected 0.6707627
accuracy_optimism_corrected_CIL 0.5697674
accuracy_optimism_corrected_CIU 0.7682297
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 27.0000000
auc 0.9100159
auc_optimism_corrected 0.8695718
auc_optimism_corrected_CIL 0.7804090
auc_optimism_corrected_CIU 0.9534194
accuracy 0.8720379
accuracy_optimism_corrected 0.8197023
accuracy_optimism_corrected_CIL 0.7302235
accuracy_optimism_corrected_CIU 0.8972568
roc_c2.5.2.1.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
# see the results
knn_model$model_summary %>% t() [,1]
k 12.0000000
auc 0.9568493
auc_optimism_corrected 0.8867898
auc_optimism_corrected_CIL 0.7995277
auc_optimism_corrected_CIU 0.9600000
accuracy 0.7692308
accuracy_optimism_corrected 0.7505565
accuracy_optimism_corrected_CIL 0.6074755
accuracy_optimism_corrected_CIU 0.8600000
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 17.0000000
auc 0.8493789
auc_optimism_corrected 0.7274604
auc_optimism_corrected_CIL 0.6082378
auc_optimism_corrected_CIU 0.8332776
accuracy 0.7692308
accuracy_optimism_corrected 0.6961743
accuracy_optimism_corrected_CIL 0.5961806
accuracy_optimism_corrected_CIU 0.7950340
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 12.0000000
auc 0.9638674
auc_optimism_corrected 0.9214737
auc_optimism_corrected_CIL 0.8567085
auc_optimism_corrected_CIU 0.9739614
accuracy 0.8530806
accuracy_optimism_corrected 0.7997011
accuracy_optimism_corrected_CIL 0.6820732
accuracy_optimism_corrected_CIU 0.8987821
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.2.1.2 RF
model="rf"2.5.2.1.2.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "105"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.946523"
auc_optimism_corrected_CIL "0.8726572"
auc_optimism_corrected_CIU "0.9949915"
accuracy "1"
accuracy_optimism_corrected "0.868117"
accuracy_optimism_corrected_CIL "0.7609048"
accuracy_optimism_corrected_CIU "0.960024"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "115"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.8555864"
auc_optimism_corrected_CIL "0.7560028"
auc_optimism_corrected_CIU "0.941253"
accuracy "1"
accuracy_optimism_corrected "0.8330184"
accuracy_optimism_corrected_CIL "0.7464789"
accuracy_optimism_corrected_CIU "0.9102992"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "115"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.8996574"
auc_optimism_corrected_CIL "0.8172801"
auc_optimism_corrected_CIU "0.9705953"
accuracy "1"
accuracy_optimism_corrected "0.874331"
accuracy_optimism_corrected_CIL "0.7870625"
accuracy_optimism_corrected_CIU "0.9412131"
roc_c2.5.2.1.2.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "47"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.9495441"
auc_optimism_corrected_CIL "0.8771007"
auc_optimism_corrected_CIU "0.9959453"
accuracy "1"
accuracy_optimism_corrected "0.8764871"
accuracy_optimism_corrected_CIL "0.7804167"
accuracy_optimism_corrected_CIU "0.9615385"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "57"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.8244378"
auc_optimism_corrected_CIL "0.7188584"
auc_optimism_corrected_CIU "0.9062878"
accuracy "1"
accuracy_optimism_corrected "0.79629"
accuracy_optimism_corrected_CIL "0.7037701"
accuracy_optimism_corrected_CIU "0.8726469"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "133"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9117739"
auc_optimism_corrected_CIL "0.8210604"
auc_optimism_corrected_CIU "0.978767"
accuracy "1"
accuracy_optimism_corrected "0.8682508"
accuracy_optimism_corrected_CIL "0.7820513"
accuracy_optimism_corrected_CIU "0.9421487"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.2.1.3 Gradient boosting
model="gb"2.5.2.1.3.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 1.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9548905
auc_optimism_corrected_CIL 0.9022039
auc_optimism_corrected_CIU 0.9919377
accuracy 1.0000000
accuracy_optimism_corrected 0.8793789
accuracy_optimism_corrected_CIL 0.7884615
accuracy_optimism_corrected_CIU 0.9604118
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 0.9998965
auc_optimism_corrected 0.8678054
auc_optimism_corrected_CIL 0.7868984
auc_optimism_corrected_CIU 0.9351204
accuracy 0.9903846
accuracy_optimism_corrected 0.8340986
accuracy_optimism_corrected_CIL 0.7482500
accuracy_optimism_corrected_CIU 0.9085227
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9250994
auc_optimism_corrected_CIL 0.8522212
auc_optimism_corrected_CIU 0.9818952
accuracy 1.0000000
accuracy_optimism_corrected 0.8766714
accuracy_optimism_corrected_CIL 0.8050354
accuracy_optimism_corrected_CIU 0.9358974
roc_c2.5.2.1.3.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 1.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9727397
auc_optimism_corrected_CIL 0.9275174
auc_optimism_corrected_CIU 1.0000000
accuracy 1.0000000
accuracy_optimism_corrected 0.9037504
accuracy_optimism_corrected_CIL 0.8155329
accuracy_optimism_corrected_CIU 0.9719540
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 1.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 0.9909938
auc_optimism_corrected 0.8303197
auc_optimism_corrected_CIL 0.7403750
auc_optimism_corrected_CIU 0.9115621
accuracy 0.9278846
accuracy_optimism_corrected 0.7987144
accuracy_optimism_corrected_CIL 0.7053850
accuracy_optimism_corrected_CIU 0.8767736
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 200.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9369762
auc_optimism_corrected_CIL 0.8678590
auc_optimism_corrected_CIU 0.9834064
accuracy 1.0000000
accuracy_optimism_corrected 0.8813001
accuracy_optimism_corrected_CIL 0.8133333
accuracy_optimism_corrected_CIU 0.9448440
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.2.2 Relative abundances
2.5.2.2.1 Elastic net
2.5.2.2.1.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 1.00000000
lambda 0.06282139
auc 0.99726027
auc_czech 0.99662162
auc_no 0.99818841
auc_optimism_corrected 0.92410231
auc_optimism_corrected_CIL 0.83677721
auc_optimism_corrected_CIU 0.97242424
accuracy 0.93706294
accuracy_czech NaN
accuracy_no 0.93902439
accuracy_optimism_corrected 0.85997837
accuracy_optimism_corrected_CIL 0.77602041
accuracy_optimism_corrected_CIU 0.91766690
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.0000000
lambda 3.1074497
auc 0.9894410
auc_czech 0.9950980
auc_no 0.9734300
auc_optimism_corrected 0.7340394
auc_optimism_corrected_CIL 0.6170852
auc_optimism_corrected_CIU 0.8336118
accuracy 0.7259615
accuracy_czech NaN
accuracy_no 0.5243902
accuracy_optimism_corrected 0.6974512
accuracy_optimism_corrected_CIL 0.5967411
accuracy_optimism_corrected_CIU 0.8050044
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.0941553
auc 1.0000000
auc_czech 1.0000000
auc_no 1.0000000
auc_optimism_corrected 0.8878147
auc_optimism_corrected_CIL 0.8182230
auc_optimism_corrected_CIU 0.9395208
accuracy 0.9620853
accuracy_czech NaN
accuracy_no 0.9861111
accuracy_optimism_corrected 0.8346693
accuracy_optimism_corrected_CIL 0.7516067
accuracy_optimism_corrected_CIU 0.9093405
roc_c2.5.2.2.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.1357028
auc 0.9982387
auc_czech 0.9954955
auc_no 1.0000000
auc_optimism_corrected 0.9386177
auc_optimism_corrected_CIL 0.8965000
auc_optimism_corrected_CIU 0.9793945
accuracy 0.9860140
accuracy_czech NaN
accuracy_no 1.0000000
accuracy_optimism_corrected 0.8663045
accuracy_optimism_corrected_CIL 0.8016509
accuracy_optimism_corrected_CIU 0.9358844
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.40000000
lambda 0.08652139
auc 0.94720497
auc_czech 0.95751634
auc_no 0.88828502
auc_optimism_corrected 0.73722094
auc_optimism_corrected_CIL 0.65355232
auc_optimism_corrected_CIU 0.81212375
accuracy 0.82211538
accuracy_czech NaN
accuracy_no 0.73170732
accuracy_optimism_corrected 0.71869940
accuracy_optimism_corrected_CIL 0.61986111
accuracy_optimism_corrected_CIU 0.79628473
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.1352860
auc 0.9857058
auc_czech 0.9809221
auc_no 0.9915123
auc_optimism_corrected 0.8571252
auc_optimism_corrected_CIL 0.8014650
auc_optimism_corrected_CIU 0.8939396
accuracy 0.9099526
accuracy_czech NaN
accuracy_no 0.9583333
accuracy_optimism_corrected 0.7924925
accuracy_optimism_corrected_CIL 0.7205995
accuracy_optimism_corrected_CIU 0.8558642
roc_c2.5.2.2.2 kNN
2.5.2.2.2.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 10.0000000
auc 0.9155577
auc_optimism_corrected 0.8288216
auc_optimism_corrected_CIL 0.6990440
auc_optimism_corrected_CIU 0.9324125
accuracy 0.6993007
accuracy_optimism_corrected 0.6665810
accuracy_optimism_corrected_CIL 0.5590566
accuracy_optimism_corrected_CIU 0.7549259
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 20.0000000
auc 0.8137164
auc_optimism_corrected 0.6656270
auc_optimism_corrected_CIL 0.5828798
auc_optimism_corrected_CIU 0.7357973
accuracy 0.6778846
accuracy_optimism_corrected 0.6814455
accuracy_optimism_corrected_CIL 0.6193304
accuracy_optimism_corrected_CIU 0.7416264
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 15.0000000
auc 0.8773576
auc_optimism_corrected 0.8241731
auc_optimism_corrected_CIL 0.7426389
auc_optimism_corrected_CIU 0.9093236
accuracy 0.8578199
accuracy_optimism_corrected 0.7672256
accuracy_optimism_corrected_CIL 0.6851333
accuracy_optimism_corrected_CIU 0.8682099
roc_c2.5.2.2.2.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 11.0000000
auc 0.9076321
auc_optimism_corrected 0.8187868
auc_optimism_corrected_CIL 0.7649740
auc_optimism_corrected_CIU 0.8737386
accuracy 0.7622378
accuracy_optimism_corrected 0.7045763
accuracy_optimism_corrected_CIL 0.6121154
accuracy_optimism_corrected_CIU 0.7918367
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 19.0000000
auc 0.8088509
auc_optimism_corrected 0.7182619
auc_optimism_corrected_CIL 0.6604457
auc_optimism_corrected_CIU 0.7917652
accuracy 0.7403846
accuracy_optimism_corrected 0.7189495
accuracy_optimism_corrected_CIL 0.6446429
accuracy_optimism_corrected_CIU 0.7566076
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 21.0000000
auc 0.8874330
auc_optimism_corrected 0.8009876
auc_optimism_corrected_CIL 0.7335368
auc_optimism_corrected_CIU 0.8864556
accuracy 0.8151659
accuracy_optimism_corrected 0.7465345
accuracy_optimism_corrected_CIL 0.6166738
accuracy_optimism_corrected_CIU 0.8089339
roc_c2.5.2.2.3 RF
2.5.2.2.3.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "71"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.9749018"
auc_optimism_corrected_CIL "0.9501443"
auc_optimism_corrected_CIU "0.9929425"
accuracy "1"
accuracy_optimism_corrected "0.8954299"
accuracy_optimism_corrected_CIL "0.8438571"
accuracy_optimism_corrected_CIU "0.930791"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "31"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.901005"
auc_optimism_corrected_CIL "0.8504941"
auc_optimism_corrected_CIU "0.9534551"
accuracy "1"
accuracy_optimism_corrected "0.8441077"
accuracy_optimism_corrected_CIL "0.785"
accuracy_optimism_corrected_CIU "0.8924805"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "17"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.9248253"
auc_optimism_corrected_CIL "0.8775136"
auc_optimism_corrected_CIU "0.9641348"
accuracy "1"
accuracy_optimism_corrected "0.875495"
accuracy_optimism_corrected_CIL "0.8132086"
accuracy_optimism_corrected_CIU "0.9172763"
roc_c2.5.2.2.3.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "17"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.9665056"
auc_optimism_corrected_CIL "0.9449662"
auc_optimism_corrected_CIU "0.9863349"
accuracy "1"
accuracy_optimism_corrected "0.902696"
accuracy_optimism_corrected_CIL "0.8441384"
accuracy_optimism_corrected_CIU "0.9399633"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "85"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.8604571"
auc_optimism_corrected_CIL "0.824759"
auc_optimism_corrected_CIU "0.9035517"
accuracy "1"
accuracy_optimism_corrected "0.8070124"
accuracy_optimism_corrected_CIL "0.758049"
accuracy_optimism_corrected_CIU "0.8571429"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "31"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.9158303"
auc_optimism_corrected_CIL "0.8472241"
auc_optimism_corrected_CIU "0.9820129"
accuracy "1"
accuracy_optimism_corrected "0.8678365"
accuracy_optimism_corrected_CIL "0.8040909"
accuracy_optimism_corrected_CIU "0.9498863"
roc_c2.5.2.2.4 Gradient boosting
2.5.2.2.4.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9694065
auc_optimism_corrected_CIL 0.9465650
auc_optimism_corrected_CIU 0.9947741
accuracy 1.0000000
accuracy_optimism_corrected 0.9138621
accuracy_optimism_corrected_CIL 0.8255102
accuracy_optimism_corrected_CIU 0.9639881
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 200.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.8818766
auc_optimism_corrected_CIL 0.8250024
auc_optimism_corrected_CIU 0.9331889
accuracy 1.0000000
accuracy_optimism_corrected 0.8505813
accuracy_optimism_corrected_CIL 0.7753125
accuracy_optimism_corrected_CIU 0.9131829
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_ra")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9152636
auc_optimism_corrected_CIL 0.8490806
auc_optimism_corrected_CIU 0.9591980
accuracy 1.0000000
accuracy_optimism_corrected 0.8440457
accuracy_optimism_corrected_CIL 0.7779221
accuracy_optimism_corrected_CIU 0.9026667
roc_c2.5.2.2.4.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9704068
auc_optimism_corrected_CIL 0.9420501
auc_optimism_corrected_CIU 0.9912599
accuracy 1.0000000
accuracy_optimism_corrected 0.9084848
accuracy_optimism_corrected_CIL 0.8403249
accuracy_optimism_corrected_CIU 0.9575208
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.8347726
auc_optimism_corrected_CIL 0.7841428
auc_optimism_corrected_CIU 0.8892752
accuracy 1.0000000
accuracy_optimism_corrected 0.8039108
accuracy_optimism_corrected_CIL 0.7626480
accuracy_optimism_corrected_CIU 0.8542208
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_ra")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9470524
auc_optimism_corrected_CIL 0.9072132
auc_optimism_corrected_CIU 0.9829447
accuracy 1.0000000
accuracy_optimism_corrected 0.8996262
accuracy_optimism_corrected_CIL 0.8474900
accuracy_optimism_corrected_CIU 0.9373579
roc_c2.5.2.3 Saving results
models_list <- list()
for (model_name in names(supplements_models$models_summ)){
df <- do.call(rbind, supplements_models$models_summ[[model_name]])
models_list[[model_name]] <- df
}
write.xlsx(models_list,
file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
rowNames=TRUE)2.6 Results overview
2.6.0.1 Alpha diversity
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -53.642 | 11.742 | -4.568 | 0.000 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | 11.219 | 10.488 | 1.070 | 0.287 | 0.361 | |
| healthy vs Grouppre_ltx:CountryNO | 6.313 | 15.403 | 0.410 | 0.683 | 0.683 | |
| pre_ltx vs Grouppost_ltx | 25.623 | 11.366 | 2.254 | 0.025 | 0.057 | |
| pre_ltx , post_ltx - CZ vs NO | 17.533 | 12.615 | 1.390 | 0.166 | 0.249 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -35.933 | 15.921 | -2.257 | 0.025 | 0.057 | |
| healthy vs Grouppost_ltx | -28.019 | 9.237 | -3.033 | 0.003 | 0.012 | * |
| healthy , post_ltx - CZ vs NO | 11.219 | 11.267 | 0.996 | 0.321 | 0.361 | |
| healthy vs Grouppost_ltx:CountryNO | -29.620 | 14.629 | -2.025 | 0.044 | 0.080 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.505 | 0.159 | -3.171 | 0.002 | 0.017 | * |
| healthy , pre_ltx - CZ vs NO | 0.007 | 0.142 | 0.046 | 0.963 | 0.963 | |
| healthy vs Grouppre_ltx:CountryNO | 0.144 | 0.209 | 0.687 | 0.493 | 0.634 | |
| pre_ltx vs Grouppost_ltx | 0.372 | 0.148 | 2.514 | 0.013 | 0.049 | * |
| pre_ltx , post_ltx - CZ vs NO | 0.150 | 0.164 | 0.915 | 0.361 | 0.542 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.502 | 0.207 | -2.423 | 0.016 | 0.049 | * |
| healthy vs Grouppost_ltx | -0.134 | 0.111 | -1.200 | 0.231 | 0.416 | |
| healthy , post_ltx - CZ vs NO | 0.007 | 0.136 | 0.048 | 0.962 | 0.963 | |
| healthy vs Grouppost_ltx:CountryNO | -0.358 | 0.176 | -2.031 | 0.044 | 0.098 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.046 | 0.019 | -2.481 | 0.014 | 0.129 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.017 | -0.466 | 0.642 | 0.692 | |
| healthy vs Grouppre_ltx:CountryNO | 0.027 | 0.024 | 1.092 | 0.277 | 0.498 | |
| pre_ltx vs Grouppost_ltx | 0.037 | 0.021 | 1.763 | 0.079 | 0.238 | |
| pre_ltx , post_ltx - CZ vs NO | 0.019 | 0.023 | 0.811 | 0.418 | 0.627 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.060 | 0.029 | -2.033 | 0.043 | 0.195 | |
| healthy vs Grouppost_ltx | -0.009 | 0.016 | -0.572 | 0.568 | 0.692 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.396 | 0.692 | 0.692 | |
| healthy vs Grouppost_ltx:CountryNO | -0.033 | 0.025 | -1.303 | 0.194 | 0.436 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.040 | 0.024 | -1.708 | 0.090 | 0.285 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.021 | -0.394 | 0.694 | 0.943 | |
| healthy vs Grouppre_ltx:CountryNO | 0.010 | 0.031 | 0.332 | 0.740 | 0.943 | |
| pre_ltx vs Grouppost_ltx | 0.042 | 0.022 | 1.922 | 0.056 | 0.285 | |
| pre_ltx , post_ltx - CZ vs NO | 0.002 | 0.024 | 0.082 | 0.935 | 0.943 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.051 | 0.030 | -1.677 | 0.095 | 0.285 | |
| healthy vs Grouppost_ltx | 0.001 | 0.017 | 0.072 | 0.943 | 0.943 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.407 | 0.684 | 0.943 | |
| healthy vs Grouppost_ltx:CountryNO | -0.041 | 0.027 | -1.525 | 0.129 | 0.290 |
Plots
alpha_div_plots[[paste(segment,"Country")]]2.6.0.2 Beta diversity
Main analysis
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 695.306 | 4.259 | 0.029 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 309.768 | 1.811 | 0.009 | 0.003 | 0.003 | ** |
| post_ltx vs healthy | 1 | 914.409 | 5.454 | 0.025 | 0.001 | 0.002 | ** |
| pre_ltx vs healthy , Country | 1 | 482.074 | 2.953 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 699.075 | 4.087 | 0.019 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 728.020 | 4.342 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs healthy : Country | 1 | 237.598 | 1.460 | 0.010 | 0.010 | 0.030 | * |
| pre_ltx vs post_ltx : Country | 1 | 237.265 | 1.390 | 0.007 | 0.021 | 0.032 | * |
| post_ltx vs healthy : Country | 1 | 202.888 | 1.211 | 0.006 | 0.106 | 0.106 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]Supplementary analysis
knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta)) & (grepl("genus",names(supplements_beta)))],
digits = 3,
caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
|
|
PCA
ggarrange(plotlist = supplements_beta[grepl("PCoA",names(supplements_beta))],
labels=names(supplements_beta[grepl("PCoA",names(supplements_beta))]),
font.label = list(size=5,face="plain"),
ncol=2,nrow=3)2.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
as.data.frame(lapply(psc_effect,nrow))) %>% t() %>%
`colnames<-`("Count") %>%
`rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus","PSC effect Phylum")),caption="Number of significant taxa")| Count | |
|---|---|
| terminal_ileum genus healthy vs pre_ltx | 25 |
| terminal_ileum genus pre_ltx vs post_ltx | 4 |
| terminal_ileum genus healthy vs post_ltx | 38 |
| terminal_ileum ASV healthy vs pre_ltx | 42 |
| terminal_ileum ASV pre_ltx vs post_ltx | 2 |
| terminal_ileum ASV healthy vs post_ltx | 65 |
| terminal_ileum phylum healthy vs pre_ltx | 1 |
| terminal_ileum phylum pre_ltx vs post_ltx | 0 |
| terminal_ileum phylum healthy vs post_ltx | 3 |
| PSC effect ASV | 15 |
| PSC effect Genus | 25 |
| PSC effect Phylum | 0 |
2.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=2,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| pre_ltx vs healthy ASV terminal_ileum | 0.4 | 0.03 | 0.96 | 0.89 | 1.00 |
| pre_ltx vs post_ltx ASV terminal_ileum | 0.2 | 0.13 | 0.86 | 0.79 | 0.93 |
| post_ltx vs healthy ASV terminal_ileum | 0.2 | 0.05 | 0.96 | 0.90 | 0.99 |
| pre_ltx vs healthy genus terminal_ileum | 0.2 | 0.04 | 0.97 | 0.92 | 1.00 |
| pre_ltx vs post_ltx genus terminal_ileum | 0.0 | 0.67 | 0.83 | 0.75 | 0.92 |
| post_ltx vs healthy genus terminal_ileum | 0.0 | 0.07 | 0.97 | 0.92 | 0.99 |
ROC - Genus level
roc_curve_all_custom(roc_cs[c(4:6)],Q="Q1",
model_name="enet_model")Supplementary models
# Build final dataframe
models_list[["enet_model"]] <- models_summ_df_ileum
final_df <- tibble(row_names = rownames(models_list[[1]]))
# Loop through models and extract required values
for (model_name in names(models_list)) {
model_df <- models_list[[model_name]]
# Combine AUC_optimism_corrected with its CI values
final_df[[model_name]] <- paste0(
round(model_df$auc_optimism_corrected, 3),
" (", round(model_df$auc_optimism_corrected_CIL, 3), "; ",
round(model_df$auc_optimism_corrected_CIU, 3), ")"
)
}
knitr::kable(final_df, caption="All models")| row_names | knn_model | rf_model | gbm_model | enet_model_ra | knn_model_ra | rf_model_ra | gbm_model_ra | enet_model |
|---|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy ASV terminal_ileum | 0.871 (0.761; 0.954) | 0.947 (0.873; 0.995) | 0.955 (0.902; 0.992) | 0.924 (0.837; 0.972) | 0.829 (0.699; 0.932) | 0.975 (0.95; 0.993) | 0.969 (0.947; 0.995) | 0.956 (0.894; 0.996) |
| pre_ltx vs post_ltx ASV terminal_ileum | 0.686 (0.555; 0.794) | 0.856 (0.756; 0.941) | 0.868 (0.787; 0.935) | 0.734 (0.617; 0.834) | 0.666 (0.583; 0.736) | 0.901 (0.85; 0.953) | 0.882 (0.825; 0.933) | 0.865 (0.785; 0.932) |
| post_ltx vs healthy ASV terminal_ileum | 0.87 (0.78; 0.953) | 0.9 (0.817; 0.971) | 0.925 (0.852; 0.982) | 0.888 (0.818; 0.94) | 0.824 (0.743; 0.909) | 0.925 (0.878; 0.964) | 0.915 (0.849; 0.959) | 0.958 (0.9; 0.99) |
| pre_ltx vs healthy genus terminal_ileum | 0.887 (0.8; 0.96) | 0.95 (0.877; 0.996) | 0.973 (0.928; 1) | 0.939 (0.896; 0.979) | 0.819 (0.765; 0.874) | 0.967 (0.945; 0.986) | 0.97 (0.942; 0.991) | 0.966 (0.916; 0.996) |
| pre_ltx vs post_ltx genus terminal_ileum | 0.727 (0.608; 0.833) | 0.824 (0.719; 0.906) | 0.83 (0.74; 0.912) | 0.737 (0.654; 0.812) | 0.718 (0.66; 0.792) | 0.86 (0.825; 0.904) | 0.835 (0.784; 0.889) | 0.834 (0.749; 0.915) |
| post_ltx vs healthy genus terminal_ileum | 0.921 (0.857; 0.974) | 0.912 (0.821; 0.979) | 0.937 (0.868; 0.983) | 0.857 (0.801; 0.894) | 0.801 (0.734; 0.886) | 0.916 (0.847; 0.982) | 0.947 (0.907; 0.983) | 0.966 (0.922; 0.994) |
ROC - genus
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()
for (model_name in names(rocs_list)) {
plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
Q="Q1",
model_name=model_name)
}
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=5))
p3 Data Analysis - Colon
segment="colon"3.1 Filtering
Rules: - sequencing depth > 10000 - nearZeroVar() with default settings
Rarefaction Curve
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw()+
theme(axis.text=element_text(size=8),
panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed", color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(colon_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.1 Sequencing depth
data_filt <- seq_depth_filtering(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
seq_depth_threshold = 10000)Removing 75 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata
seq_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Check zero depth
data_filt <- check_zero_depth(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]];
filt_colon_taxa_tab <- data_filt[[2]];
filt_colon_metadata <- data_filt[[3]]; Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.3 Final Counts
final_counts_filtering(colon_asv_tab,
filt_colon_asv_tab,
filt_colon_metadata,
seq_step, 0, nearzero_step) V1
Raw data: ASVs 6936
Raw data: Samples 789
Sequencing depth filt: ASVs 6861
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 381
Filt data: ASVs 381
Filt data: Samples 760
Filt data: Patients 354
Filt data: Patients.1 0
Filtered samples 29
Matrices Cecum;Rectum;CD;CA;SI
healthy 161
non-rPSC 0
rPSC 0
pre_ltx 250
post_ltx 349
ETOH 0
3.2 Alpha diversity
Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear mixed-effect model.
path = "../results/Q1/alpha_diversity"Calculation
# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
alpha_colon_taxa_tab,
alpha_colon_metadata))
colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
Observe=colon_mpse$Observe,
Shannon=colon_mpse$Shannon,
Simpson=colon_mpse$Simpson,
Pielou=colon_mpse$Pielou,
Group=colon_mpse$Group,
Country=colon_mpse$Country,
Patient=colon_mpse$Patient)
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)3.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha3.2.2 Linear Model
path = "../results/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
results_model <- pairwise.lmer(
formula = "Observe ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_detailed <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_detailed <- NA
}
# save the results
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -56.230 | 11.738 | 195.570 | -4.791 | 0.000 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | 14.193 | 10.341 | 194.538 | 1.372 | 0.171 | 0.256 | |
| healthy vs Grouppre_ltx:CountryNO | -7.489 | 15.169 | 194.358 | -0.494 | 0.622 | 0.622 | |
| pre_ltx vs Grouppost_ltx | 28.886 | 11.811 | 267.502 | 2.446 | 0.015 | 0.045 | * |
| pre_ltx vs post_ltx - CZ vs NO | 6.742 | 12.174 | 265.784 | 0.554 | 0.580 | 0.622 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -24.047 | 15.112 | 262.828 | -1.591 | 0.113 | 0.203 | |
| healthy vs Grouppost_ltx | -27.299 | 8.699 | 250.458 | -3.138 | 0.002 | 0.009 | ** |
| healthy vs post_ltx - CZ vs NO | 14.360 | 11.153 | 251.729 | 1.288 | 0.199 | 0.256 | |
| healthy vs Grouppost_ltx:CountryNO | -31.681 | 14.195 | 247.531 | -2.232 | 0.027 | 0.060 |
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
Shannon
results_model <- pairwise.lmer(
formula = "Shannon ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_detailed <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_detailed <- NA
}
# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.585 | 0.143 | 197.588 | -4.084 | 0.000 | 0.001 | *** |
| healthy vs pre_ltx - CZ vs NO | 0.015 | 0.126 | 196.213 | 0.116 | 0.907 | 0.909 | |
| healthy vs Grouppre_ltx:CountryNO | 0.081 | 0.185 | 195.965 | 0.437 | 0.663 | 0.852 | |
| pre_ltx vs Grouppost_ltx | 0.364 | 0.158 | 265.003 | 2.307 | 0.022 | 0.056 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.095 | 0.163 | 263.448 | 0.585 | 0.559 | 0.839 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.455 | 0.202 | 260.765 | -2.253 | 0.025 | 0.056 | |
| healthy vs Grouppost_ltx | -0.220 | 0.102 | 247.586 | -2.153 | 0.032 | 0.058 | |
| healthy vs post_ltx - CZ vs NO | 0.015 | 0.131 | 248.759 | 0.114 | 0.909 | 0.909 | |
| healthy vs Grouppost_ltx:CountryNO | -0.376 | 0.167 | 244.904 | -2.255 | 0.025 | 0.056 |
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
|
Simpson
results_model <- pairwise.lmer(
formula = "Simpson ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_detailed <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_detailed <- NA
}
# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.059 | 0.019 | 199.853 | -3.066 | 0.002 | 0.022 | * |
| healthy vs pre_ltx - CZ vs NO | -0.007 | 0.017 | 198.127 | -0.430 | 0.667 | 0.738 | |
| healthy vs Grouppre_ltx:CountryNO | 0.020 | 0.025 | 197.803 | 0.813 | 0.417 | 0.626 | |
| pre_ltx vs Grouppost_ltx | 0.031 | 0.026 | 263.530 | 1.213 | 0.226 | 0.455 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.013 | 0.027 | 261.964 | 0.476 | 0.635 | 0.738 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.052 | 0.033 | 259.262 | -1.556 | 0.121 | 0.363 | |
| healthy vs Grouppost_ltx | -0.027 | 0.017 | 246.082 | -1.594 | 0.112 | 0.363 | |
| healthy vs post_ltx - CZ vs NO | -0.007 | 0.022 | 247.135 | -0.335 | 0.738 | 0.738 | |
| healthy vs Grouppost_ltx:CountryNO | -0.032 | 0.028 | 243.697 | -1.147 | 0.253 | 0.455 |
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")Table: Raw results of independent country analysis
Pielou
results_model <- pairwise.lmer(
formula = "Pielou ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_detailed <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_detailed <- NA
}
# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.057 | 0.020 | 201.130 | -2.828 | 0.005 | 0.046 | * |
| healthy vs pre_ltx - CZ vs NO | -0.008 | 0.018 | 199.017 | -0.446 | 0.656 | 0.781 | |
| healthy vs Grouppre_ltx:CountryNO | 0.015 | 0.026 | 198.602 | 0.565 | 0.573 | 0.781 | |
| pre_ltx vs Grouppost_ltx | 0.039 | 0.024 | 265.797 | 1.631 | 0.104 | 0.234 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.007 | 0.025 | 263.735 | 0.274 | 0.785 | 0.785 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.062 | 0.031 | 260.213 | -2.015 | 0.045 | 0.202 | |
| healthy vs Grouppost_ltx | -0.018 | 0.016 | 247.076 | -1.130 | 0.260 | 0.467 | |
| healthy vs post_ltx - CZ vs NO | -0.008 | 0.020 | 248.451 | -0.394 | 0.694 | 0.781 | |
| healthy vs Grouppost_ltx:CountryNO | -0.047 | 0.026 | 243.881 | -1.828 | 0.069 | 0.206 |
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
|
3.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))3.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.
3.3.1 Main analysis
Genus level, Aitchison distance
level="genus"path = "../results/Q1/beta_diversity"Aggregation, filtering
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level=level,
names=TRUE)
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
colon_metadata,
seq_depth_threshold=10000)Removing 10 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]3.3.2 PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
interaction = TRUE,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1222.921 | 7.950 | 0.019 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 630.146 | 4.062 | 0.007 | 0.008 | 0.008 | ** |
| post_ltx vs healthy | 1 | 1736.169 | 11.014 | 0.021 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 738.576 | 4.801 | 0.011 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1532.675 | 9.879 | 0.016 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1728.746 | 10.966 | 0.021 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 318.719 | 2.077 | 0.005 | 0.325 | 0.488 | |
| pre_ltx vs post_ltx : Country | 1 | 322.405 | 2.082 | 0.003 | 0.524 | 0.524 | |
| post_ltx vs healthy : Country | 1 | 398.820 | 2.538 | 0.005 | 0.112 | 0.336 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_genus_metadata$Patient)
print(result_list)
}
}3.3.2.0.1 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
show_boxplots = TRUE,
variable = "Group", size=2,
show_legend= FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p3.3.2.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))3.3.3 Supplementary analysis
3.3.3.1 Genus level
level="genus"3.3.3.1.1 Bray-Curtis
PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 2.119 | 10.732 | 0.024 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx | 1 | 1.307 | 6.239 | 0.010 | 0.001 | 0.001 | *** |
| post_ltx vs healthy | 1 | 3.286 | 17.150 | 0.031 | 0.001 | 0.001 | *** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 2.862 | 14.497 | 0.033 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 6.618 | 31.603 | 0.049 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 5.496 | 28.682 | 0.052 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.783 | 3.993 | 0.009 | 0.012 | 0.018 | * |
| pre_ltx vs post_ltx : Country | 1 | 0.562 | 2.691 | 0.004 | 0.112 | 0.112 | |
| post_ltx vs healthy : Country | 1 | 0.998 | 5.252 | 0.009 | 0.002 | 0.006 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.680 0.0271 3.7602 0.05 0.05
Residual 135 24.413 0.9729
Total 136 25.093 1.0000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.222 0.03858 10.915 0.001 0.0013333
Residual 272 55.360 0.96142
Total 273 57.582 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 2.166 0.0386 9.9564 0.001 0.0013333
Residual 248 53.948 0.9614
Total 249 56.114 1.0000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.4791 0.05417 9.1064 0.001 0.0013333
Residual 159 25.8258 0.94583
Total 160 27.3049 1.00000
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.302 0.02297 7.0989 0.001 0.001
Residual 302 55.371 0.97703
Total 303 56.673 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.982 0.06786 14.924 0.001 0.001
Residual 205 40.966 0.93214
Total 206 43.949 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 5.014 0.06639 24.748 0.001 0.001
Residual 348 70.512 0.93361
Total 349 75.526 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.4791 0.05417 9.1064 0.001 0.001
Residual 159 25.8258 0.94583
Total 160 27.3049 1.00000
Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p3.3.3.1.2 Jaccard
PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 2.225 | 7.844 | 0.018 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 1.382 | 4.680 | 0.007 | 0.002 | 0.002 | ** |
| post_ltx vs healthy | 1 | 3.371 | 12.044 | 0.022 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 3.013 | 10.622 | 0.025 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 6.474 | 21.918 | 0.035 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 5.595 | 19.994 | 0.037 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.939 | 3.330 | 0.008 | 0.010 | 0.015 | * |
| pre_ltx vs post_ltx : Country | 1 | 0.696 | 2.362 | 0.004 | 0.163 | 0.163 | |
| post_ltx vs healthy : Country | 1 | 1.134 | 4.078 | 0.008 | 0.001 | 0.003 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.759 0.02074 2.8587 0.061 0.061
Residual 135 35.842 0.97926
Total 136 36.601 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.405 0.02957 8.2869 0.001 0.0013333
Residual 272 78.937 0.97043
Total 273 81.342 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 2.196 0.02856 7.2916 0.001 0.0013333
Residual 248 74.696 0.97144
Total 249 76.892 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.756 0.04196 6.9641 0.001 0.0013333
Residual 159 40.083 0.95804
Total 160 41.839 1.00000
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.456 0.01743 5.3572 0.001 0.001
Residual 302 82.098 0.98257
Total 303 83.554 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 3.049 0.04918 10.604 0.001 0.001
Residual 205 58.937 0.95082
Total 206 61.986 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 4.974 0.04696 17.147 0.001 0.001
Residual 348 100.952 0.95304
Total 349 105.926 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.756 0.04196 6.9641 0.001 0.001
Residual 159 40.083 0.95804
Total 160 41.839 1.00000
Plots
Custom
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p3.3.3.2 ASV level
level="ASV"3.3.3.2.1 Aitchison
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(x=pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
sim.method = "robust.aitchison",
p.adjust.m="BH",
patients = filt_colon_metadata$Patient)
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
interaction = TRUE,
sim.method = "robust.aitchison",
p.adjust.m="BH",
patients = filt_colon_metadata$Patient)
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)
# see the results
pp_factor pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
1 pre_ltx vs healthy 1 1711.535 7.089160 0.016854638 0.001 0.0015 **
2 pre_ltx vs post_ltx 1 799.824 3.515868 0.005780561 0.023 0.0230 *
3 post_ltx vs healthy 1 2419.630 10.210591 0.019444444 0.001 0.0015 **
pp_cov pairs Df SumsOfSqs F.Model R2 p.value
1 pre_ltx vs healthy , Country 1 948.4242 3.928362 0.00933977 0.001
2 pre_ltx vs post_ltx , Country 1 1595.1673 7.012040 0.01152874 0.001
3 post_ltx vs healthy , Country 1 1870.9425 7.895186 0.01503512 0.001
p.adjusted sig
1 0.001 ***
2 0.001 ***
3 0.001 ***
pp_fac.cov pairs Df SumsOfSqs F.Model R2 p.value
1 pre_ltx vs healthy : Country 1 571.9973 2.377204 0.005632842 0.086
2 pre_ltx vs post_ltx : Country 1 439.3379 1.934270 0.003175223 0.768
3 post_ltx vs healthy : Country 1 626.6210 2.652897 0.005035604 0.023
p.adjusted sig
1 0.129
2 0.768
3 0.069
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1711.535 | 7.089 | 0.017 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 799.824 | 3.516 | 0.006 | 0.023 | 0.023 | * |
| post_ltx vs healthy | 1 | 2419.630 | 10.211 | 0.019 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 948.424 | 3.928 | 0.009 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1595.167 | 7.012 | 0.012 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1870.942 | 7.895 | 0.015 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 571.997 | 2.377 | 0.006 | 0.086 | 0.129 | |
| pre_ltx vs post_ltx : Country | 1 | 439.338 | 1.934 | 0.003 | 0.768 | 0.768 | |
| post_ltx vs healthy : Country | 1 | 626.621 | 2.653 | 0.005 | 0.023 | 0.069 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_metadata$Patient)
print(result_list)
}
}PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
show_boxplots = TRUE,
variable = "Group",
size=3,
show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p
# see the results
p3.3.3.2.2 Bray-Curtis
PERMANOVA
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 3.090 | 9.636 | 0.023 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx | 1 | 1.948 | 5.972 | 0.010 | 0.001 | 0.001 | *** |
| post_ltx vs healthy | 1 | 4.788 | 15.423 | 0.029 | 0.001 | 0.001 | *** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 2.383 | 7.429 | 0.017 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 5.328 | 16.338 | 0.026 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 4.912 | 15.825 | 0.029 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.901 | 2.821 | 0.007 | 0.040 | 0.060 | |
| pre_ltx vs post_ltx : Country | 1 | 0.830 | 2.551 | 0.004 | 0.131 | 0.131 | |
| post_ltx vs healthy : Country | 1 | 1.362 | 4.417 | 0.008 | 0.003 | 0.009 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_metadata$Patient,
sim.method = 'bray')
print(result_list)
}$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.098 0.02219 6.8301 0.001 0.001
Residual 301 92.461 0.97781
Total 302 94.559 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 4.052 0.05993 13.068 0.001 0.001
Residual 205 63.559 0.94007
Total 206 67.611 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 4.574 0.03999 14.456 0.001 0.001
Residual 347 109.796 0.96001
Total 348 114.370 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.700 0.03547 5.8478 0.001 0.001
Residual 159 46.225 0.96453
Total 160 47.925 1.00000
PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p3.3.3.2.3 Jaccard
PERMANOVA
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 2.453 | 6.302 | 0.015 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx | 1 | 1.695 | 4.312 | 0.007 | 0.001 | 0.001 | *** |
| post_ltx vs healthy | 1 | 3.721 | 9.734 | 0.018 | 0.001 | 0.001 | *** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 2.005 | 5.151 | 0.012 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 4.185 | 10.649 | 0.017 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 3.933 | 10.288 | 0.020 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 0.906 | 2.336 | 0.006 | 0.086 | 0.129 | |
| pre_ltx vs post_ltx : Country | 1 | 0.862 | 2.199 | 0.004 | 0.291 | 0.291 | |
| post_ltx vs healthy : Country | 1 | 1.291 | 3.393 | 0.006 | 0.004 | 0.012 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.758 0.01512 4.6216 0.001 0.001
Residual 301 114.465 0.98488
Total 302 116.223 1.00000
$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 3.254 0.04003 8.5476 0.001 0.001
Residual 205 78.052 0.95997
Total 206 81.306 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 3.68 0.02673 9.5297 0.001 0.001
Residual 347 134.00 0.97327
Total 348 137.68 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.544 0.0257 4.1941 0.001 0.001
Residual 159 58.516 0.9743
Total 160 60.059 1.0000
PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p3.3.3.3 Saving results
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
file = file.path(path,
paste0("supplements_beta_diversity_", segment,".xlsx")))3.4 Univariate Analysis
Using two DAA tools - linDA and Maaslin2. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.
Main analysis is performed at genus level. In supplementary analysis, it is performed at ASV and phylum levels.
3.4.1 Main analysis
level="genus"# needed paths
path = "../results/Q1/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q1",level)
# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_venns <- list()
# workbook for final df
wb <- createWorkbook()
# PSC effect
psc_effect <- list()3.4.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
colon_genus_taxa_tab)3.4.1.1.1 pre_ltx vs healthy
3.4.1.1.1.1 linDA
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 135 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 411 samples and 148 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.1.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 17 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 600 samples and 130 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.2.3 Group - Intersection
intersection_results <- group_intersection(group,
list_intersections,
list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.2.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,group,
usage="linDA")Removing 71 ASV(s)
Removing 5 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 511 samples and 153 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.3.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
colon_taxa_tab) + xlab("") + ylab("")min_clr -2.221209
max_clr 6.886638
min_log -3.771476
max_log 5.049706
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
dot_heatmap_colon <- p3.4.1.1.5 PSC effect
To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).
pre_LTx vs HC and Post_LTx vs HC intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
3 Rothia
15 Barnesiella
18 Butyricimonas
19 Odoribacter
27 Alistipes
28 Parabacteroides
39 Holdemania
57 Coprococcus
61 Fusicatenibacter
64 Hungatella
65 Lachnoclostridium
67 Lachnospiraceae_FCS020_group
89 Colidextribacter
91 Intestinimonas
93 Oscillibacter
109 Negativibacillus
122 Family_XIII_AD3011_group
133 Dialister
135 Veillonella
143 Enterococcus
146 Klebsiella
148 Pseudomonas
Taxonomy
3 k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Micrococcales;f__Micrococcaceae;g__Rothia
15 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
18 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
19 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
28 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
39 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
57 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
61 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
64 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Hungatella
65 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
67 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
89 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Colidextribacter
91 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas
93 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
109 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus
122 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
133 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Dialister
135 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
143 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
146 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
148 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
log2FoldChange p_value padj
3 1.842842 1.452081e-05 1.953709e-04
15 -3.590912 7.871469e-05 6.852808e-04
18 -3.002837 3.969499e-06 7.343572e-05
19 -3.406707 3.432387e-07 1.693311e-05
27 -3.110558 1.594988e-05 1.967152e-04
28 -3.568149 2.950689e-06 6.238599e-05
39 -1.174869 7.087889e-03 2.439553e-02
57 -3.357534 5.546126e-06 8.208266e-05
61 -1.917547 1.097200e-02 3.455012e-02
64 2.775798 1.422148e-04 1.107779e-03
65 -1.736481 1.812625e-03 8.383390e-03
67 -2.111851 1.228192e-03 6.268013e-03
89 -1.624469 8.895010e-03 2.991958e-02
91 -1.592037 6.335927e-03 2.404403e-02
93 -2.063990 9.829922e-04 5.538365e-03
109 -1.649295 1.442001e-02 4.268322e-02
122 -1.590837 1.010377e-03 5.538365e-03
133 2.144460 1.055678e-02 3.396528e-02
135 3.864870 1.310501e-07 9.697706e-06
143 5.049706 2.872348e-10 4.251075e-08
146 2.288824 1.471023e-03 7.257045e-03
148 1.318798 5.015789e-03 2.006316e-02
3.4.1.2 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("significant_taxa_",segment,".xlsx")))3.4.2 Supplementary Analysis
3.4.2.1 ASV level
level="ASV"path_maaslin="../intermediate_files/maaslin/Q1/ASV/"raw_linda_results[[segment]] <- list()
linda_results[[segment]] <- list()
supplements_wb <- createWorkbook()3.4.2.1.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 2138 ASV(s)
Removing 70 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 411 samples and 368 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
Volcano plot
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
# see the results
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_asv_tab,colon_taxa_tab,
colon_metadata,group, usage="linDA")Removing 1157 ASV(s)
Removing 52 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 599 samples and 249 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
# see the results
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment=segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 1096 ASV(s)
Removing 50 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 510 samples and 370 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,uni_statistics$colon[grepl(level,names(uni_statistics$colon))],colon_taxa_tab)min_clr -1.270472
max_clr 5.320228
min_log -4.431053
max_log 4.150355
dotheatmap_linda3.4.2.1.5 PSC effect
pre_LTx vs Healthy and Post_LTx vs Healthy intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
10 TAGGGAATCTTCGGCAATGGGGGCAACCCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTAAGTCAAGAACGAGTGTGAGAGTGGAAAGTTCACACTGTGACGGTAGCTTACCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTGATAAGTCTGAAGTTAAAGGCTGTGGCTCAACCATAGTTCGCTTTGGAAACTGTCAAACTTGAGTGCAGAAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCGGTGGCGAAAGCGGCTCTCTGGTCTGTAACTGACGCT
15 TAGGGAATTTTCGGCAATGGGGGAAACCCTGACCGAGCAACGCCGCGTGAAGGAAGAAGTAATTCGTTATGTAAACTTCTGTCATAGAGGAAGAACGGTGGATATAGGGAATGATATCCAAGTGACGGTACTCTATAAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCACTAAGGGTCTGTGGTGAAAGATCGAAGCTTAACTTCGGTAAGCCATGGAAACCGTAGAGCTAGAGTGTGTGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCGCATAACTGACGCTC
20 TAGGGAATTTTCGTCAATGGGGGAAACCCTGAACGAGCAATGCCGCGTGAGTGAAGAAGGTCTTCGGATCGTAAAGCTCTGTTGTAAGTGAAGAACGGCTCATAGAGGAAATGCTATGGGAGTGACGGTAGCTTACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATCATTGGGCGTAAAGGGTGCGTAGGTGGCGTACTAAGTCTGTAGTAAAAGGCAATGGCTCAACCATTGTAAGCTATGGAAACTGGTATGCTGGAGTGCAGAAGAGGGCGATGGAATTCCATGTGTAGCGGTAAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGTCGCCTGGTCTGTAACTGACACTG
28 TGAGGAATATTGGGCAATGGAGGAAACTCTGACCCAGCCATGCCGCGTGAGTGAAGAAGGTTTTCGAATTGTAAAGCTCTTTCGGATGGGACGATGATGACGGTACCATCTAAAGAAGCCCCGGCAAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGTGTGTAGGCGGATTTGTAAGATAGTGGTGAAATACCTGAGCTCAACTTAGGAATTGCCATTATAACTATAGATCTGGAGTGACAGAGAGGATATTGGAATACCCAGTGTAGAGGTGAAATTCGTAGATATTGGGTAGAACACCAGTGGCGAAGGCGAGTATCTGGCTGTCAACTGACGCTGAGGCACGAAAGCATGGGGATCAAA
48 TGAGGAATATTGGTCAATGGACGAGAGTCTGAACCAGCCAAGTCGCGTGAGGGAAGACTGCCCTATGGGTTGTAAACCTCTTTTATAAGGGAAGAATAAGTTCTACGTGTAGAATGATGCCTGTACCTTATGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTATTAAGTTAGTGGTTAAATATTTGAGCTAAACTCAATTGTGCCATTAATACTGGTAAACTGGAGTACAGACGAGGTAGGCGGAATAAGTTAAGTAGCGGTGAAATGCATAGATATAACTTAGAACTCCGATAGCGAAGGCAGCTTACCAGACTGTAACTGACGCTGATGCA
54 TGAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGCAGGATGACGGCTCTATGAGTTGTAAACTGCTTTTGTACGAGGGTAAACGCAGATACGTGTATCTGTCTGAAAGTATCGTACGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATTTCGGGGCTCAACCCTGAACGTGCCTCTAATACTGTTGAGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGC
57 TGAGGAATATTGGTCAATGGACGCGAGTCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATATGGGAATAAAGTTGTCCACGTGTGGATTTTTGTATGTACCATATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGCAGTCTTGAGTGCAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGAGTGTAACTGACGCTGATGC
97 TGGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCACGGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGCTGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
111 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCACTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGAGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
112 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGCAAGTCTGAAGTGAAAGGCATGGGCTCAACCTGTGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
115 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCGATGCAAGTCTGGAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTATGGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
116 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAAGCAAGTCTGAAGTGAAAACCCAGGGCTCAACCCTGGGACTGCTTTGGAAACTGTTTTGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
117 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTAAAGCAAGTCTGAAGTGAAAGCCCGCGGCTCAACTGCGGGACTGCTTTGGAAACTGTTTAACTGGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGACTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
137 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGCCTTTGGGTCGTAAACTTCTGTTCTAAGGGAAGATAGTGACGGTACCTTAGGAGCAAGTCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGTACGTAGGTGGTTTTCTAAGCACGGGGTTTAAGGCAATGGCTTAACCATTGTTCGCCTTGTGAACTGGAAGACTTGAGTGCAGGAGAGGAAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGACTGTAACTGACACTGAGGTACGAAAGCGTGGGGAGCAAAC
139 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGAAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTTTGCAAGTCTGAAGTGAAAGCCCGGGGCTTAACCCCGGGACTGCTTTGGAAACTGTAGAACTAGAGTGCAGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
140 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGAAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTTTGCAAGTCTGAAGTGAAAGCCCGGGGCTTAACCCCGGGACTGCTTTGGAAACTGTAGGACTAGAGTGCAGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
142 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGCGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGAATGGCTTTGGAAACTGTGCAGCTAGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
145 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGACAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGCCTTGCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
169 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCATTGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGTAGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
170 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTATGGCAAGTCTGATGTGAAAGGCCAGGGCTCAACCCTGGGACTGCATTGGAAACTGTCGAACTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
175 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCACCGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGTAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTATGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCCAGTGTAGCGGTGAAATGCGTAGATATTGGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
180 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGGCAAGTCTGATGTGAAAACCCGGGGCTCAACCCCGTGACTGCATTGGAAACTGTTTTGCTTGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGCAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
187 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAAGGCAAGTCTGATGTGAAAACCCAGGGCTTAACCCTGGGACTGCATTGGAAACTGTCTGGCTCGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
210 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGTATGGCAAGTCAGAGGTGAAAACCCAGGGCTTAACCTTGGGATTGCCTTTGAAACTGTCAGACTAGAGTGCAGGAGGGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
237 TGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGATTGAAGAAGGCCTTCGGGTTGTAAAGATCTTTAATCAGGGACGAAACAAATGACGGTACCTGAAGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGCGCAGGCGGGCCGGTAAGTTGGAAGTGAAATCTATGGGCTTAACCCATAAACTGCTTTCAAAACTGCTGGTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
242 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAACAGTAGAAGAGGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCTGGCAAGTCAGGCGTGAAATCCCAGGGCTCAACCCTGGAACTGCGTTTGAAACTGCTGGTCTTGAGTACCGGAGAGGTCATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATGACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
246 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAAGAGTAGAAGACGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCCGGCAAGTCAGATGTGAAATCTGGAGGCTTAACCTCCAAACTGCATTTGAAACTGTAGGTCTTGAGTACCGGAGAGGTTATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATAACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
262 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTATGAGGGACGAAGGAAGTGACGGTACCTCATGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCGAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGGAAGCAAGTCAGATGTGAAAACCAGTGGCTCAACCACTGGCCTGCATTTGAAACTGTTTTTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
264 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTCTCAGGGACGAACAAATGACGGTACCTGAGGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGAAGGCAAGTCAGATGTGAAAACTATGGGCTCAACCCATAGCCTGCATTTGAAACTGTTTTTCTTGAGTGCTGGAGAGGCAATCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGATTGCTGGACAGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
292 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTGAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATAGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
293 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTTAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATTGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
325 TGGGGGATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGACGCAAGTCTGAAGTGAAATACCCGGGCTCAACCTGGGAACTGCTTTGGAAACTGTGTTGCTAGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
332 TGGGGGATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGGGAAGACGGCCTTCGGGTTGTAAACCTCTGTCTTCGGGGACGAATAAATGACGGTACCCGAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGTAGGCGGGGAGGCAAGTTGAATGTCTAAACTATCGGCTCAACTGATAGTCGCGTTCAAAACTGCCACTCTTGAGTGCAGTAGAGGTAGGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCCTACTGGGCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGTAGCAA
335 TGGGGGATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGGAGGAAGAAGGTTTTCGGATTGTAAACTCCTGTCGTTAGGGACGATAATGACGGTACCTAACAAGAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGGTGCAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGCAGGCGGACCGGCAAGTTGGAAGTGAAAACTATGGGCTCAACCCATAAATTGCTTTCAAAACTGCTGGCCTTGAGTAGTGCAGAGGTAGGTGGAATTCCCGGTGTAGCGGTGGAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGACCTACTGGGCACCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAA
346 TAGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAACAAGGATGAGAGTAACTGTTCATCCCTTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCT
357 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
367 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAATGGTTCTTGTGCAAATAGTGCGAGGATTTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATTGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
Taxonomy
10 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__unassigned
15 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelatoclostridiaceae;g__Erysipelotrichaceae_UCG-003;s__unassigned
20 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__[Clostridium]_innocuum_group;s__unassigned
28 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodospirillales;f__uncultured;g__uncultured;s__unassigned
48 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter;s__unassigned
54 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__unassigned
57 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__unassigned
97 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
111 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
112 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
115 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
116 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
117 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
137 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group;s__unassigned
139 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
140 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
142 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
145 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
169 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
170 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
175 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__unassigned
180 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__CAG-56;s__unassigned
187 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter;s__unassigned
210 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
237 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus;s__unassigned
242 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
246 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
262 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__uncultured;s__unassigned
264 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas;s__unassigned
292 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
293 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
325 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_UCG-004;s__unassigned
332 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus;s__unassigned
335 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Subdoligranulum;s__unassigned
346 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__unassigned
357 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
367 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
log2FoldChange p_value padj
10 2.6718947 1.020219e-04 2.208475e-03
15 -3.7840235 1.431558e-05 4.052411e-04
20 1.7161594 2.227124e-03 2.129950e-02
28 -1.8328032 2.682573e-03 2.295784e-02
48 -2.4005935 5.703093e-04 7.773105e-03
54 -4.2728652 1.391165e-06 6.399358e-05
57 -2.7446018 2.373042e-03 2.129950e-02
97 -3.1314152 1.355523e-04 2.625433e-03
111 -4.4310528 8.690369e-07 5.330093e-05
112 -3.9852492 2.068392e-07 1.902921e-05
115 -2.4672999 1.366106e-03 1.675756e-02
116 3.2596602 2.646199e-04 4.426370e-03
117 3.3617848 2.408327e-05 6.330459e-04
137 -1.5682632 1.246219e-04 2.547825e-03
139 1.7316915 7.115666e-03 4.438246e-02
140 -1.9647594 7.516426e-03 4.610074e-02
142 -2.1424529 1.739921e-03 1.829402e-02
145 -4.2349983 1.293743e-06 6.399358e-05
169 -2.1064068 5.031662e-05 1.234435e-03
170 -3.1944423 2.168950e-08 3.990869e-06
175 -4.3755342 7.466843e-08 9.159327e-06
180 -3.1928924 3.580378e-04 5.362121e-03
187 -2.2441970 3.571741e-03 2.738335e-02
210 -1.5342059 2.556985e-03 2.240406e-02
237 -2.1219142 1.656897e-03 1.829402e-02
242 -0.8792198 3.883155e-03 2.916329e-02
246 -2.8737528 7.715727e-06 2.581261e-04
262 -2.0515366 3.567012e-03 2.738335e-02
264 -2.4237638 3.897547e-06 1.434297e-04
292 3.0513101 3.648769e-06 1.434297e-04
293 1.7601475 6.362803e-03 4.037089e-02
325 -2.2782634 1.658920e-03 1.829402e-02
332 -1.8536497 9.239523e-04 1.214337e-02
335 -3.4581653 4.652641e-04 6.585276e-03
346 3.6455563 4.788864e-07 3.524604e-05
357 1.4450185 4.835811e-03 3.469320e-02
367 1.7110108 3.642745e-04 5.362121e-03
3.4.2.2 Phylum level
level="phylum"path_maaslin="../intermediate_files/maaslin/Q1/Phylum/"raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum[[segment]] <- list()Aggregate taxa
phylum_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = "Phylum")
colon_phylum_tab <- phylum_data[[1]]
colon_phylum_taxa_tab <- phylum_data[[2]]3.4.2.2.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
colon_phylum_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 2 ASV(s)
Removing 1 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 411 samples and 8 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
Volcano plot
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
# see the results
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
colon_phylum_taxa_tab,
colon_metadata,group, usage="linDA")Removing 1 ASV(s)
Removing 1 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 600 samples and 8 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
# see the results
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment=segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.2.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])linDA
# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
colon_phylum_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 2 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 511 samples and 8 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))Using Phylum for naming
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Country effect")Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcanoMaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcanoGroup - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
vennInteraction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.2.2.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,uni_statistics$colon[grepl(level,names(uni_statistics$colon))],colon_taxa_tab)min_clr -4.297728
max_clr 5.183865
min_log -2.728687
max_log 2.86356
dotheatmap_linda3.4.2.2.5 PSC effect
pre_LTx vs Healthy and Post_LTx vs Healthy intersection
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] [1] SeqID Taxonomy log2FoldChange p_value padj
<0 rows> (or 0-length row.names)
3.4.2.3 Saving results
# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect,
file.path(path,paste0("supplements_psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))3.5 Machine learning
Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.
The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.
path = "../results/Q1/models"3.5.1 ENET
model="enet"3.5.1.1 ASV level
level="ASV"3.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.200000000
lambda 0.007285712
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.944482930
auc_optimism_corrected_CIL 0.879483337
auc_optimism_corrected_CIU 0.986503241
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.872590475
accuracy_optimism_corrected_CIL 0.773863933
accuracy_optimism_corrected_CIU 0.937692308
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 250
$czech
Predicted
True 0 1
0 95 0
1 0 42
$no
Predicted
True 0 1
0 66 0
1 0 208
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group", N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 1.000000000
lambda 0.004424186
auc 0.999667622
auc_czech 0.999656593
auc_no 0.999522640
auc_optimism_corrected 0.788320408
auc_optimism_corrected_CIL 0.715647773
auc_optimism_corrected_CIU 0.842743983
accuracy 0.993322204
accuracy_czech NaN
accuracy_no 0.994269341
accuracy_optimism_corrected 0.725877580
accuracy_optimism_corrected_CIL 0.657080387
accuracy_optimism_corrected_CIU 0.769266055
enet_model$conf_matrices$original
Predicted
True 0 1
0 347 2
1 2 248
$czech
Predicted
True 0 1
0 207 1
1 1 41
$no
Predicted
True 0 1
0 140 1
1 1 207
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.40000000
lambda 0.00203662
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96930614
auc_optimism_corrected_CIL 0.93705851
auc_optimism_corrected_CIU 0.98669590
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.91353052
accuracy_optimism_corrected_CIL 0.86451155
accuracy_optimism_corrected_CIU 0.95141794
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 349
$czech
Predicted
True 0 1
0 95 0
1 0 208
$no
Predicted
True 0 1
0 66 0
1 0 141
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]3.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.04262835
auc 0.99955280
auc_czech 0.99974937
auc_no 0.99927156
auc_optimism_corrected 0.93420491
auc_optimism_corrected_CIL 0.88169929
auc_optimism_corrected_CIU 0.97185321
accuracy 0.98783455
accuracy_czech NaN
accuracy_no 0.98905109
accuracy_optimism_corrected 0.85834482
accuracy_optimism_corrected_CIL 0.79082492
accuracy_optimism_corrected_CIU 0.92284119
enet_model$conf_matrices$original
Predicted
True 0 1
0 159 2
1 3 247
$czech
Predicted
True 0 1
0 94 1
1 1 41
$no
Predicted
True 0 1
0 65 1
1 2 206
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.05181695
auc 0.96996571
auc_czech 0.95887446
auc_no 0.96447081
auc_optimism_corrected 0.75746917
auc_optimism_corrected_CIL 0.67636751
auc_optimism_corrected_CIU 0.83310363
accuracy 0.91000000
accuracy_czech NaN
accuracy_no 0.90544413
accuracy_optimism_corrected 0.69549915
accuracy_optimism_corrected_CIL 0.61504214
accuracy_optimism_corrected_CIU 0.77137530
enet_model$conf_matrices$original
Predicted
True 0 1
0 325 25
1 29 221
$czech
Predicted
True 0 1
0 202 7
1 14 28
$no
Predicted
True 0 1
0 123 18
1 15 193
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.600000000
lambda 0.002245931
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.961311372
auc_optimism_corrected_CIL 0.923354885
auc_optimism_corrected_CIU 0.985361058
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.892756667
accuracy_optimism_corrected_CIL 0.838856178
accuracy_optimism_corrected_CIU 0.940785776
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 350
$czech
Predicted
True 0 1
0 95 0
1 0 209
$no
Predicted
True 0 1
0 66 0
1 0 141
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.3 Saving results
models_summ_df_colon <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))3.5.2 Supplementary models
3.5.2.1 CLR-transformed data
3.5.2.1.1 kNN
model="knn"3.5.2.1.1.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 17.0000000
auc 0.9278012
auc_optimism_corrected 0.8390623
auc_optimism_corrected_CIL 0.7635165
auc_optimism_corrected_CIU 0.9036531
accuracy 0.8150852
accuracy_optimism_corrected 0.7496215
accuracy_optimism_corrected_CIL 0.6860774
accuracy_optimism_corrected_CIU 0.8177634
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 10.0000000
auc 0.9259427
auc_optimism_corrected 0.6312841
auc_optimism_corrected_CIL 0.5483666
auc_optimism_corrected_CIU 0.7185543
accuracy 0.8363940
accuracy_optimism_corrected 0.6068770
accuracy_optimism_corrected_CIL 0.5126121
accuracy_optimism_corrected_CIU 0.6853063
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 10.0000000
auc 0.9701009
auc_optimism_corrected 0.8647529
auc_optimism_corrected_CIL 0.8049770
auc_optimism_corrected_CIU 0.9012555
accuracy 0.9019608
accuracy_optimism_corrected 0.8252680
accuracy_optimism_corrected_CIL 0.8068314
accuracy_optimism_corrected_CIU 0.8466299
roc_c3.5.2.1.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
# see the results
knn_model$model_summary %>% t() [,1]
k 26.0000000
auc 0.9423478
auc_optimism_corrected 0.8630524
auc_optimism_corrected_CIL 0.7819441
auc_optimism_corrected_CIU 0.9295728
accuracy 0.8515815
accuracy_optimism_corrected 0.7772840
accuracy_optimism_corrected_CIL 0.6761526
accuracy_optimism_corrected_CIU 0.8621100
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 15.0000000
auc 0.8733486
auc_optimism_corrected 0.6517804
auc_optimism_corrected_CIL 0.5548968
auc_optimism_corrected_CIU 0.7336038
accuracy 0.7800000
accuracy_optimism_corrected 0.6081844
accuracy_optimism_corrected_CIL 0.5233996
accuracy_optimism_corrected_CIU 0.6846266
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0.004"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 30.0000000
auc 0.9530169
auc_optimism_corrected 0.8895200
auc_optimism_corrected_CIL 0.8250313
auc_optimism_corrected_CIU 0.9512991
accuracy 0.8395303
accuracy_optimism_corrected 0.7630536
accuracy_optimism_corrected_CIL 0.6610833
accuracy_optimism_corrected_CIU 0.8620708
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.2.1.2 RF
model="rf"3.5.2.1.2.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "143"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9599069"
auc_optimism_corrected_CIL "0.9178946"
auc_optimism_corrected_CIU "0.9905348"
accuracy "1"
accuracy_optimism_corrected "0.8984082"
accuracy_optimism_corrected_CIL "0.8416505"
accuracy_optimism_corrected_CIU "0.9411982"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "37"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.7945271"
auc_optimism_corrected_CIL "0.7586882"
auc_optimism_corrected_CIU "0.8382909"
accuracy "1"
accuracy_optimism_corrected "0.7230647"
accuracy_optimism_corrected_CIL "0.6800637"
accuracy_optimism_corrected_CIU "0.7797127"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "237"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9177307"
auc_optimism_corrected_CIL "0.8786343"
auc_optimism_corrected_CIU "0.95099"
accuracy "1"
accuracy_optimism_corrected "0.8817967"
accuracy_optimism_corrected_CIL "0.8375135"
accuracy_optimism_corrected_CIU "0.929279"
roc_c3.5.2.1.2.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "23"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9488154"
auc_optimism_corrected_CIL "0.8952989"
auc_optimism_corrected_CIU "0.9862301"
accuracy "1"
accuracy_optimism_corrected "0.8831177"
accuracy_optimism_corrected_CIL "0.8094864"
accuracy_optimism_corrected_CIU "0.9377419"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "71"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.7772877"
auc_optimism_corrected_CIL "0.6827278"
auc_optimism_corrected_CIU "0.8559813"
accuracy "1"
accuracy_optimism_corrected "0.6880011"
accuracy_optimism_corrected_CIL "0.6"
accuracy_optimism_corrected_CIU "0.770192"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "65"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9358754"
auc_optimism_corrected_CIL "0.8669191"
auc_optimism_corrected_CIU "0.9857715"
accuracy "1"
accuracy_optimism_corrected "0.8946156"
accuracy_optimism_corrected_CIL "0.8198506"
accuracy_optimism_corrected_CIU "0.9517422"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.2.1.3 Gradient boosting
model="gb"3.5.2.1.3.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9711391
auc_optimism_corrected_CIL 0.9410907
auc_optimism_corrected_CIU 0.9950271
accuracy 1.0000000
accuracy_optimism_corrected 0.9217283
accuracy_optimism_corrected_CIL 0.8795956
accuracy_optimism_corrected_CIU 0.9639860
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.8062275
auc_optimism_corrected_CIL 0.7559881
auc_optimism_corrected_CIU 0.8528397
accuracy 1.0000000
accuracy_optimism_corrected 0.7310713
accuracy_optimism_corrected_CIL 0.6566925
accuracy_optimism_corrected_CIU 0.8001997
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 30.0000000
auc 1.0000000
auc_optimism_corrected 0.9643631
auc_optimism_corrected_CIL 0.9401224
auc_optimism_corrected_CIU 0.9858289
accuracy 1.0000000
accuracy_optimism_corrected 0.9249490
accuracy_optimism_corrected_CIL 0.8934412
accuracy_optimism_corrected_CIU 0.9505495
roc_c3.5.2.1.3.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9655809
auc_optimism_corrected_CIL 0.9237801
auc_optimism_corrected_CIU 0.9915438
accuracy 1.0000000
accuracy_optimism_corrected 0.9095237
accuracy_optimism_corrected_CIL 0.8466613
accuracy_optimism_corrected_CIU 0.9587695
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.7993128
auc_optimism_corrected_CIL 0.7208963
auc_optimism_corrected_CIU 0.8672809
accuracy 1.0000000
accuracy_optimism_corrected 0.7120601
accuracy_optimism_corrected_CIL 0.6299534
accuracy_optimism_corrected_CIU 0.7840918
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9632183
auc_optimism_corrected_CIL 0.9218211
auc_optimism_corrected_CIU 0.9922280
accuracy 1.0000000
accuracy_optimism_corrected 0.9189343
accuracy_optimism_corrected_CIL 0.8639337
accuracy_optimism_corrected_CIU 0.9631778
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.2.2 Relative abundances
3.5.2.2.1 Elastic net
3.5.2.2.1.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.400000000
lambda 0.001453833
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.958045499
auc_optimism_corrected_CIL 0.919974493
auc_optimism_corrected_CIU 0.979096386
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.896262270
accuracy_optimism_corrected_CIL 0.844885708
accuracy_optimism_corrected_CIU 0.933223776
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 1.000000000
lambda 0.008597151
auc 0.994934097
auc_czech 0.997367216
auc_no 0.990964266
auc_optimism_corrected 0.693086871
auc_optimism_corrected_CIL 0.599715149
auc_optimism_corrected_CIU 0.758895468
accuracy 0.954924875
accuracy_czech NaN
accuracy_no 0.936962751
accuracy_optimism_corrected 0.658614418
accuracy_optimism_corrected_CIL 0.578956424
accuracy_optimism_corrected_CIU 0.714558559
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.40000000
lambda 0.01071588
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.92787523
auc_optimism_corrected_CIL 0.88438933
auc_optimism_corrected_CIU 0.96548937
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.86825268
accuracy_optimism_corrected_CIL 0.82293956
accuracy_optimism_corrected_CIU 0.91311501
roc_c3.5.2.2.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.600000000
lambda 0.005344515
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.941556388
auc_optimism_corrected_CIL 0.911600992
auc_optimism_corrected_CIU 0.972603351
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.892543470
accuracy_optimism_corrected_CIL 0.856537557
accuracy_optimism_corrected_CIU 0.916957058
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.05444562
auc 0.96121143
auc_czech 0.98359535
auc_no 0.92307692
auc_optimism_corrected 0.71357822
auc_optimism_corrected_CIL 0.64835385
auc_optimism_corrected_CIU 0.78840887
accuracy 0.88666667
accuracy_czech NaN
accuracy_no 0.84240688
accuracy_optimism_corrected 0.65567603
accuracy_optimism_corrected_CIL 0.61790490
accuracy_optimism_corrected_CIU 0.71605550
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary
supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.03612602
auc 0.99948536
auc_czech 0.99919416
auc_no 0.99978508
auc_optimism_corrected 0.90741276
auc_optimism_corrected_CIL 0.85907537
auc_optimism_corrected_CIU 0.93834169
accuracy 0.98630137
accuracy_czech NaN
accuracy_no 0.99033816
accuracy_optimism_corrected 0.84975642
accuracy_optimism_corrected_CIL 0.80744317
accuracy_optimism_corrected_CIU 0.89348916
roc_c3.5.2.2.2 kNN
3.5.2.2.2.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 10.0000000
auc 0.9469814
auc_optimism_corrected 0.8313865
auc_optimism_corrected_CIL 0.7623059
auc_optimism_corrected_CIU 0.9127872
accuracy 0.8418491
accuracy_optimism_corrected 0.7280742
accuracy_optimism_corrected_CIL 0.6065461
accuracy_optimism_corrected_CIU 0.8395928
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 11.0000000
auc 0.8853123
auc_optimism_corrected 0.6461102
auc_optimism_corrected_CIL 0.5576335
auc_optimism_corrected_CIU 0.6959418
accuracy 0.7829716
accuracy_optimism_corrected 0.5998313
accuracy_optimism_corrected_CIL 0.5435465
accuracy_optimism_corrected_CIU 0.6506473
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 27.0000000
auc 0.8988058
auc_optimism_corrected 0.7833043
auc_optimism_corrected_CIL 0.7181715
auc_optimism_corrected_CIU 0.8345504
accuracy 0.8470588
accuracy_optimism_corrected 0.7398395
accuracy_optimism_corrected_CIL 0.6948382
accuracy_optimism_corrected_CIU 0.7959477
roc_c3.5.2.2.2.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 11.0000000
auc 0.9567826
auc_optimism_corrected 0.8368374
auc_optimism_corrected_CIL 0.8155406
auc_optimism_corrected_CIU 0.8653107
accuracy 0.8710462
accuracy_optimism_corrected 0.7147966
accuracy_optimism_corrected_CIL 0.6117988
accuracy_optimism_corrected_CIU 0.7849673
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 11.0000000
auc 0.8806571
auc_optimism_corrected 0.7054169
auc_optimism_corrected_CIL 0.6762969
auc_optimism_corrected_CIU 0.7323169
accuracy 0.7966667
accuracy_optimism_corrected 0.6665210
accuracy_optimism_corrected_CIL 0.6304974
accuracy_optimism_corrected_CIU 0.7159198
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 10.0000000
auc 0.9378971
auc_optimism_corrected 0.8158411
auc_optimism_corrected_CIL 0.7672457
auc_optimism_corrected_CIU 0.8676063
accuracy 0.8688845
accuracy_optimism_corrected 0.7515841
accuracy_optimism_corrected_CIL 0.7030274
accuracy_optimism_corrected_CIU 0.8059060
roc_c3.5.2.2.3 RF
3.5.2.2.3.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "35"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9853502"
auc_optimism_corrected_CIL "0.9603161"
auc_optimism_corrected_CIU "0.9975648"
accuracy "1"
accuracy_optimism_corrected "0.9357294"
accuracy_optimism_corrected_CIL "0.8963622"
accuracy_optimism_corrected_CIU "0.9711765"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "5"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.8473815"
auc_optimism_corrected_CIL "0.797293"
auc_optimism_corrected_CIU "0.8816524"
accuracy "1"
accuracy_optimism_corrected "0.747765"
accuracy_optimism_corrected_CIL "0.7101008"
accuracy_optimism_corrected_CIU "0.7993964"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "29"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.956065"
auc_optimism_corrected_CIL "0.9327369"
auc_optimism_corrected_CIU "0.9767077"
accuracy "1"
accuracy_optimism_corrected "0.9008109"
accuracy_optimism_corrected_CIL "0.8833349"
accuracy_optimism_corrected_CIU "0.9116247"
roc_c3.5.2.2.3.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "29"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9666467"
auc_optimism_corrected_CIL "0.9478243"
auc_optimism_corrected_CIU "0.9915155"
accuracy "1"
accuracy_optimism_corrected "0.9050136"
accuracy_optimism_corrected_CIL "0.8613362"
accuracy_optimism_corrected_CIU "0.9496371"
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "5"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.8200504"
auc_optimism_corrected_CIL "0.7705389"
auc_optimism_corrected_CIU "0.8738565"
accuracy "1"
accuracy_optimism_corrected "0.7253942"
accuracy_optimism_corrected_CIL "0.68768"
accuracy_optimism_corrected_CIU "0.7775975"
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "21"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.9442324"
auc_optimism_corrected_CIL "0.8986835"
auc_optimism_corrected_CIU "0.98245"
accuracy "1"
accuracy_optimism_corrected "0.8922164"
accuracy_optimism_corrected_CIL "0.8508627"
accuracy_optimism_corrected_CIU "0.9221786"
roc_c3.5.2.2.4 Gradient boosting
3.5.2.2.4.1 ASV level
level="ASV"pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9820545
auc_optimism_corrected_CIL 0.9690812
auc_optimism_corrected_CIU 0.9938243
accuracy 1.0000000
accuracy_optimism_corrected 0.9297256
accuracy_optimism_corrected_CIL 0.8889593
accuracy_optimism_corrected_CIU 0.9639860
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.8302454
auc_optimism_corrected_CIL 0.7760046
auc_optimism_corrected_CIU 0.8687548
accuracy 1.0000000
accuracy_optimism_corrected 0.7523700
accuracy_optimism_corrected_CIL 0.7024771
accuracy_optimism_corrected_CIU 0.8019913
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_ra",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 30.0000000
auc 1.0000000
auc_optimism_corrected 0.9764553
auc_optimism_corrected_CIL 0.9568100
auc_optimism_corrected_CIU 0.9874606
accuracy 1.0000000
accuracy_optimism_corrected 0.9273141
accuracy_optimism_corrected_CIL 0.8983308
accuracy_optimism_corrected_CIU 0.9509660
roc_c3.5.2.2.4.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.9710847
auc_optimism_corrected_CIL 0.9503172
auc_optimism_corrected_CIU 0.9855660
accuracy 1.0000000
accuracy_optimism_corrected 0.9064350
accuracy_optimism_corrected_CIL 0.8692260
accuracy_optimism_corrected_CIU 0.9273529
roc_cpre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 5.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.8269826
auc_optimism_corrected_CIL 0.7823150
auc_optimism_corrected_CIU 0.8818896
accuracy 1.0000000
accuracy_optimism_corrected 0.7418934
accuracy_optimism_corrected_CIL 0.6947787
accuracy_optimism_corrected_CIU 0.7919780
roc_cpost_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_ra",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.9680632
auc_optimism_corrected_CIL 0.9425710
auc_optimism_corrected_CIU 0.9886108
accuracy 1.0000000
accuracy_optimism_corrected 0.9205555
accuracy_optimism_corrected_CIL 0.8830395
accuracy_optimism_corrected_CIU 0.9530963
roc_c3.5.2.3 Saving results
models_list <- list()
for (model_name in names(supplements_models$models_summ)){
df <- do.call(rbind, supplements_models$models_summ[[model_name]])
models_list[[model_name]] <- df
}
write.xlsx(models_list,
file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
rowNames=TRUE)3.6 Results overview
3.6.0.1 Alpha diversity
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -56.230 | 11.738 | 195.570 | -4.791 | 0.000 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | 14.193 | 10.341 | 194.538 | 1.372 | 0.171 | 0.256 | |
| healthy vs Grouppre_ltx:CountryNO | -7.489 | 15.169 | 194.358 | -0.494 | 0.622 | 0.622 | |
| pre_ltx vs Grouppost_ltx | 28.886 | 11.811 | 267.502 | 2.446 | 0.015 | 0.045 | * |
| pre_ltx vs post_ltx - CZ vs NO | 6.742 | 12.174 | 265.784 | 0.554 | 0.580 | 0.622 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -24.047 | 15.112 | 262.828 | -1.591 | 0.113 | 0.203 | |
| healthy vs Grouppost_ltx | -27.299 | 8.699 | 250.458 | -3.138 | 0.002 | 0.009 | ** |
| healthy vs post_ltx - CZ vs NO | 14.360 | 11.153 | 251.729 | 1.288 | 0.199 | 0.256 | |
| healthy vs Grouppost_ltx:CountryNO | -31.681 | 14.195 | 247.531 | -2.232 | 0.027 | 0.060 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.585 | 0.143 | 197.588 | -4.084 | 0.000 | 0.001 | *** |
| healthy vs pre_ltx - CZ vs NO | 0.015 | 0.126 | 196.213 | 0.116 | 0.907 | 0.909 | |
| healthy vs Grouppre_ltx:CountryNO | 0.081 | 0.185 | 195.965 | 0.437 | 0.663 | 0.852 | |
| pre_ltx vs Grouppost_ltx | 0.364 | 0.158 | 265.003 | 2.307 | 0.022 | 0.056 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.095 | 0.163 | 263.448 | 0.585 | 0.559 | 0.839 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.455 | 0.202 | 260.765 | -2.253 | 0.025 | 0.056 | |
| healthy vs Grouppost_ltx | -0.220 | 0.102 | 247.586 | -2.153 | 0.032 | 0.058 | |
| healthy vs post_ltx - CZ vs NO | 0.015 | 0.131 | 248.759 | 0.114 | 0.909 | 0.909 | |
| healthy vs Grouppost_ltx:CountryNO | -0.376 | 0.167 | 244.904 | -2.255 | 0.025 | 0.056 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.059 | 0.019 | 199.853 | -3.066 | 0.002 | 0.022 | * |
| healthy vs pre_ltx - CZ vs NO | -0.007 | 0.017 | 198.127 | -0.430 | 0.667 | 0.738 | |
| healthy vs Grouppre_ltx:CountryNO | 0.020 | 0.025 | 197.803 | 0.813 | 0.417 | 0.626 | |
| pre_ltx vs Grouppost_ltx | 0.031 | 0.026 | 263.530 | 1.213 | 0.226 | 0.455 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.013 | 0.027 | 261.964 | 0.476 | 0.635 | 0.738 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.052 | 0.033 | 259.262 | -1.556 | 0.121 | 0.363 | |
| healthy vs Grouppost_ltx | -0.027 | 0.017 | 246.082 | -1.594 | 0.112 | 0.363 | |
| healthy vs post_ltx - CZ vs NO | -0.007 | 0.022 | 247.135 | -0.335 | 0.738 | 0.738 | |
| healthy vs Grouppost_ltx:CountryNO | -0.032 | 0.028 | 243.697 | -1.147 | 0.253 | 0.455 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.057 | 0.020 | 201.130 | -2.828 | 0.005 | 0.046 | * |
| healthy vs pre_ltx - CZ vs NO | -0.008 | 0.018 | 199.017 | -0.446 | 0.656 | 0.781 | |
| healthy vs Grouppre_ltx:CountryNO | 0.015 | 0.026 | 198.602 | 0.565 | 0.573 | 0.781 | |
| pre_ltx vs Grouppost_ltx | 0.039 | 0.024 | 265.797 | 1.631 | 0.104 | 0.234 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.007 | 0.025 | 263.735 | 0.274 | 0.785 | 0.785 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.062 | 0.031 | 260.213 | -2.015 | 0.045 | 0.202 | |
| healthy vs Grouppost_ltx | -0.018 | 0.016 | 247.076 | -1.130 | 0.260 | 0.467 | |
| healthy vs post_ltx - CZ vs NO | -0.008 | 0.020 | 248.451 | -0.394 | 0.694 | 0.781 | |
| healthy vs Grouppost_ltx:CountryNO | -0.047 | 0.026 | 243.881 | -1.828 | 0.069 | 0.206 |
Plots
alpha_div_plots[[paste(segment,"Country")]]3.6.0.2 Beta diversity
Main analysis
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1222.921 | 7.950 | 0.019 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 630.146 | 4.062 | 0.007 | 0.008 | 0.008 | ** |
| post_ltx vs healthy | 1 | 1736.169 | 11.014 | 0.021 | 0.001 | 0.002 | ** |
| pre_ltx vs healthy , Country | 1 | 738.576 | 4.801 | 0.011 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1532.675 | 9.879 | 0.016 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1728.746 | 10.966 | 0.021 | 0.001 | 0.001 | *** |
| pre_ltx vs healthy : Country | 1 | 318.719 | 2.077 | 0.005 | 0.325 | 0.488 | |
| pre_ltx vs post_ltx : Country | 1 | 322.405 | 2.082 | 0.003 | 0.524 | 0.524 | |
| post_ltx vs healthy : Country | 1 | 398.820 | 2.538 | 0.005 | 0.112 | 0.336 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]Supplementary analysis
knitr::kable(
supplements_beta[!grepl("PCoA",names(supplements_beta)) &
(grepl("genus",names(supplements_beta))) &
(grepl(segment,names(supplements_beta)))],
digits = 3,
caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
|
|
PCA
plot_list <- supplements_beta[grepl("PCoA",names(supplements_beta)) &
grepl(segment,names(supplements_beta))]
ggarrange(plotlist = plot_list,
labels=names(plot_list),
font.label = list(size=5,face="plain"),
ncol=2,nrow=3)3.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
as.data.frame(lapply(psc_effect,nrow))) %>% t() %>%
`colnames<-`("Count") %>%
`rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus","PSC effect Phylum")),caption="Number of significant taxa")| Count | |
|---|---|
| terminal_ileum genus healthy vs pre_ltx | 25 |
| terminal_ileum genus pre_ltx vs post_ltx | 4 |
| terminal_ileum genus healthy vs post_ltx | 38 |
| terminal_ileum ASV healthy vs pre_ltx | 42 |
| terminal_ileum ASV pre_ltx vs post_ltx | 2 |
| terminal_ileum ASV healthy vs post_ltx | 65 |
| terminal_ileum phylum healthy vs pre_ltx | 1 |
| terminal_ileum phylum pre_ltx vs post_ltx | 0 |
| terminal_ileum phylum healthy vs post_ltx | 3 |
| colon genus healthy vs pre_ltx | 33 |
| colon genus pre_ltx vs post_ltx | 4 |
| colon genus healthy vs post_ltx | 41 |
| colon ASV healthy vs pre_ltx | 55 |
| colon ASV pre_ltx vs post_ltx | 1 |
| colon ASV healthy vs post_ltx | 101 |
| colon phylum healthy vs pre_ltx | 2 |
| colon phylum pre_ltx vs post_ltx | 1 |
| colon phylum healthy vs post_ltx | 3 |
| PSC effect ASV | 22 |
| PSC effect Genus | 37 |
| PSC effect Phylum | 0 |
3.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=3,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| pre_ltx vs healthy ASV colon | 0.2 | 0.007 | 0.944 | 0.879 | 0.987 |
| pre_ltx vs post_ltx ASV colon | 1.0 | 0.004 | 0.788 | 0.716 | 0.843 |
| post_ltx vs healthy ASV colon | 0.4 | 0.002 | 0.969 | 0.937 | 0.987 |
| pre_ltx vs healthy genus colon | 0.0 | 0.043 | 0.934 | 0.882 | 0.972 |
| pre_ltx vs post_ltx genus colon | 0.0 | 0.052 | 0.757 | 0.676 | 0.833 |
| post_ltx vs healthy genus colon | 0.6 | 0.002 | 0.961 | 0.923 | 0.985 |
ROC - Genus level
roc_curve_all_custom(roc_cs[c(10:12)],Q="Q1",
model_name="enet_model")Supplementary models
# Build final dataframe
models_list[["enet_model"]] <- rbind(models_summ_df_ileum,models_summ_df_colon)
final_df <- tibble(row_names = rownames(models_list[[1]]))
# Loop through models and extract required values
for (model_name in names(models_list)) {
model_df <- models_list[[model_name]]
# Combine AUC_optimism_corrected with its CI values
final_df[[model_name]] <- paste0(
round(model_df$auc_optimism_corrected, 3),
" (", round(model_df$auc_optimism_corrected_CIL, 3), "; ",
round(model_df$auc_optimism_corrected_CIU, 3), ")"
)
}
knitr::kable(final_df, caption="All models")| row_names | knn_model | rf_model | gbm_model | enet_model_ra | knn_model_ra | rf_model_ra | gbm_model_ra | enet_model |
|---|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy ASV terminal_ileum | 0.871 (0.761; 0.954) | 0.947 (0.873; 0.995) | 0.955 (0.902; 0.992) | 0.924 (0.837; 0.972) | 0.829 (0.699; 0.932) | 0.975 (0.95; 0.993) | 0.969 (0.947; 0.995) | 0.956 (0.894; 0.996) |
| pre_ltx vs post_ltx ASV terminal_ileum | 0.686 (0.555; 0.794) | 0.856 (0.756; 0.941) | 0.868 (0.787; 0.935) | 0.734 (0.617; 0.834) | 0.666 (0.583; 0.736) | 0.901 (0.85; 0.953) | 0.882 (0.825; 0.933) | 0.865 (0.785; 0.932) |
| post_ltx vs healthy ASV terminal_ileum | 0.87 (0.78; 0.953) | 0.9 (0.817; 0.971) | 0.925 (0.852; 0.982) | 0.888 (0.818; 0.94) | 0.824 (0.743; 0.909) | 0.925 (0.878; 0.964) | 0.915 (0.849; 0.959) | 0.958 (0.9; 0.99) |
| pre_ltx vs healthy genus terminal_ileum | 0.887 (0.8; 0.96) | 0.95 (0.877; 0.996) | 0.973 (0.928; 1) | 0.939 (0.896; 0.979) | 0.819 (0.765; 0.874) | 0.967 (0.945; 0.986) | 0.97 (0.942; 0.991) | 0.966 (0.916; 0.996) |
| pre_ltx vs post_ltx genus terminal_ileum | 0.727 (0.608; 0.833) | 0.824 (0.719; 0.906) | 0.83 (0.74; 0.912) | 0.737 (0.654; 0.812) | 0.718 (0.66; 0.792) | 0.86 (0.825; 0.904) | 0.835 (0.784; 0.889) | 0.834 (0.749; 0.915) |
| post_ltx vs healthy genus terminal_ileum | 0.921 (0.857; 0.974) | 0.912 (0.821; 0.979) | 0.937 (0.868; 0.983) | 0.857 (0.801; 0.894) | 0.801 (0.734; 0.886) | 0.916 (0.847; 0.982) | 0.947 (0.907; 0.983) | 0.966 (0.922; 0.994) |
| pre_ltx vs healthy ASV colon | 0.839 (0.764; 0.904) | 0.96 (0.918; 0.991) | 0.971 (0.941; 0.995) | 0.958 (0.92; 0.979) | 0.831 (0.762; 0.913) | 0.985 (0.96; 0.998) | 0.982 (0.969; 0.994) | 0.944 (0.879; 0.987) |
| pre_ltx vs post_ltx ASV colon | 0.631 (0.548; 0.719) | 0.795 (0.759; 0.838) | 0.806 (0.756; 0.853) | 0.693 (0.6; 0.759) | 0.646 (0.558; 0.696) | 0.847 (0.797; 0.882) | 0.83 (0.776; 0.869) | 0.788 (0.716; 0.843) |
| post_ltx vs healthy ASV colon | 0.865 (0.805; 0.901) | 0.918 (0.879; 0.951) | 0.964 (0.94; 0.986) | 0.928 (0.884; 0.965) | 0.783 (0.718; 0.835) | 0.956 (0.933; 0.977) | 0.976 (0.957; 0.987) | 0.969 (0.937; 0.987) |
| pre_ltx vs healthy genus colon | 0.863 (0.782; 0.93) | 0.949 (0.895; 0.986) | 0.966 (0.924; 0.992) | 0.942 (0.912; 0.973) | 0.837 (0.816; 0.865) | 0.967 (0.948; 0.992) | 0.971 (0.95; 0.986) | 0.934 (0.882; 0.972) |
| pre_ltx vs post_ltx genus colon | 0.652 (0.555; 0.734) | 0.777 (0.683; 0.856) | 0.799 (0.721; 0.867) | 0.714 (0.648; 0.788) | 0.705 (0.676; 0.732) | 0.82 (0.771; 0.874) | 0.827 (0.782; 0.882) | 0.757 (0.676; 0.833) |
| post_ltx vs healthy genus colon | 0.89 (0.825; 0.951) | 0.936 (0.867; 0.986) | 0.963 (0.922; 0.992) | 0.907 (0.859; 0.938) | 0.816 (0.767; 0.868) | 0.944 (0.899; 0.982) | 0.968 (0.943; 0.989) | 0.961 (0.923; 0.985) |
write.csv(final_df,file=file.path(path,"AUC_all_models.csv"),row.names = FALSE)ROC - genus
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()
for (model_name in names(rocs_list)) {
plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
Q="Q1",
model_name=model_name)
}
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
p4 Paper-ready visualizations
4.1 Alpha diversity
p_A <- alpha_div_plots$`terminal_ileum Country` +
ggtitle("Terminal ileum")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_B <- alpha_div_plots$`colon Country` +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,widths = c(1,0.1,1))
Q1_alpha4.2 Beta diversity
pca_ti <- pca_plots_list$`terminal_ileum genus custom`
pca_colon <- pca_plots_list$`colon genus custom`
genus_Q1_beta <- ggarrange(pca_ti,
ggplot() + theme_minimal(),
pca_colon,ncol=3,
widths = c(1,0.1,1))
genus_Q1_betaAlpha + Beta diversity
alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta,
ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta4.3 Elastic net
Genus level
#models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENET")
models_to_plot <- c("enet_model")
names(models_to_plot) <- c("ENET")
# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
plot_list_ileum[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
Q="Q1",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)
# COLON
plot_list_colon <- list()
for (model_name in models_to_plot) {
plot_list_colon[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(10:12)],
Q="Q1",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_colon <- ggarrange(plotlist = plot_list_colon)
roc_curve_plot <- ggarrange(roc_curve_ti,
ggplot() + theme_minimal(),
roc_curve_colon,
ncol=3, widths = c(1,0.1,1))
roc_curve_plot4.4 FIGURE 2
alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot,
ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elasticalpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
widths = c(1,0.15))
alpha_beta_elastic4.5 FIGURE3 - DAA
p_ileum <- dot_heatmap_ileum +
ggtitle("Terminal ileum") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
p_colon <- dot_heatmap_colon +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
heatmap_plot <- ggarrange(p_ileum,
p_colon,
ncol = 2,labels=c("A","B"))
heatmap_plot 5 Session info
sessionInfo()R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=Czech_Czechia.utf8 LC_CTYPE=Czech_Czechia.utf8
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C
[5] LC_TIME=Czech_Czechia.utf8
time zone: Europe/Prague
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] mice_3.16.0 picante_1.8.2 ape_5.8
[4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[7] tidyverse_2.0.0 kableExtra_1.4.0 tidyr_1.3.1
[10] gbm_2.2.2 doParallel_1.0.17 iterators_1.0.14
[13] foreach_1.5.2 ranger_0.17.0 ggvenn_0.1.15
[16] Maaslin2_1.16.0 purrr_1.0.2 pROC_1.18.5
[19] glmnet_4.1-8 MicrobiomeStat_1.2 caret_6.0-94
[22] openxlsx_4.2.6.1 magrittr_2.0.3 emmeans_1.10.7
[25] lmerTest_3.1-3 robustlmm_3.3-1 lme4_1.1-35.5
[28] Matrix_1.6-5 mgcv_1.9-1 nlme_3.1-164
[31] pheatmap_1.0.12 reshape2_1.4.4 vegan_2.6-4
[34] lattice_0.22-6 permute_0.9-7 ggplotify_0.1.2
[37] ggrepel_0.9.5 ggpubr_0.6.0 MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0 ggplot2_3.5.1 tibble_3.2.1
[43] dplyr_1.1.4 cowplot_1.1.3 readr_2.1.5
[46] igraph_2.0.3 ccrepe_1.38.1 data.table_1.15.4
loaded via a namespace (and not attached):
[1] fs_1.6.4 matrixStats_1.3.0
[3] bitops_1.0-7 RColorBrewer_1.1-3
[5] numDeriv_2016.8-1.1 tools_4.3.1
[7] backports_1.4.1 R6_2.6.1
[9] lazyeval_0.2.2 jomo_2.7-6
[11] rhdf5filters_1.14.1 withr_3.0.2
[13] gridExtra_2.3 cli_3.6.2
[15] Biobase_2.62.0 logging_0.10-108
[17] biglm_0.9-3 sandwich_3.1-1
[19] labeling_0.4.3 mvtnorm_1.2-4
[21] robustbase_0.99-3 pbapply_1.7-2
[23] systemfonts_1.1.0 yulab.utils_0.2.0
[25] svglite_2.1.3 parallelly_1.38.0
[27] rstudioapi_0.17.1 generics_0.1.3
[29] gridGraphics_0.5-1 shape_1.4.6.1
[31] car_3.1-3 zip_2.3.1
[33] biomformat_1.30.0 S4Vectors_0.40.2
[35] abind_1.4-8 infotheo_1.2.0.1
[37] lifecycle_1.0.4 multcomp_1.4-28
[39] yaml_2.3.8 carData_3.0-5
[41] SummarizedExperiment_1.32.0 Rtsne_0.17
[43] rhdf5_2.46.1 recipes_1.1.1
[45] SparseArray_1.2.4 grid_4.3.1
[47] mitml_0.4-5 crayon_1.5.3
[49] pillar_1.10.1 knitr_1.50
[51] optparse_1.7.5 GenomicRanges_1.54.1
[53] statip_0.2.3 boot_1.3-31
[55] estimability_1.5.1 future.apply_1.11.3
[57] codetools_0.2-20 pan_1.9
[59] glue_1.7.0 ggfun_0.1.8
[61] vctrs_0.6.5 treeio_1.26.0
[63] gtable_0.3.6 maditr_0.8.4
[65] gower_1.0.1 xfun_0.51
[67] S4Arrays_1.2.1 prodlim_2024.06.25
[69] libcoin_1.0-10 coda_0.19-4.1
[71] pcaPP_2.0-4-1 modeest_2.4.0
[73] survival_3.5-8 timeDate_4041.110
[75] hardhat_1.4.1 lava_1.8.1
[77] statmod_1.5.0 TH.data_1.1-3
[79] ipred_0.9-15 ggtree_3.10.1
[81] GenomeInfoDb_1.38.8 fBasics_4032.96
[83] rpart_4.1.23 colorspace_2.1-0
[85] BiocGenerics_0.48.1 DBI_1.2.3
[87] nnet_7.3-19 ade4_1.7-22
[89] tidyselect_1.2.1 timeSeries_4041.111
[91] compiler_4.3.1 microbiome_1.24.0
[93] xml2_1.3.6 DelayedArray_0.28.0
[95] scales_1.3.0 DEoptimR_1.1-3-1
[97] spatial_7.3-17 digest_0.6.35
[99] minqa_1.2.8 rmarkdown_2.29
[101] XVector_0.42.0 htmltools_0.5.8.1
[103] pkgconfig_2.0.3 MatrixGenerics_1.14.0
[105] stabledist_0.7-2 fastmap_1.2.0
[107] rlang_1.1.3 htmlwidgets_1.6.4
[109] farver_2.1.2 zoo_1.8-12
[111] jsonlite_1.8.8 ModelMetrics_1.2.2.2
[113] RCurl_1.98-1.14 modeltools_0.2-23
[115] Formula_1.2-5 GenomeInfoDbData_1.2.11
[117] patchwork_1.3.0 Rhdf5lib_1.24.2
[119] munsell_0.5.1 Rcpp_1.0.12
[121] ggnewscale_0.5.1 fastGHQuad_1.0.1
[123] stringi_1.8.3 ggstar_1.0.4
[125] stable_1.1.6 zlibbioc_1.48.2
[127] MASS_7.3-60.0.1 plyr_1.8.9
[129] listenv_0.9.1 Biostrings_2.70.3
[131] splines_4.3.1 hash_2.2.6.3
[133] multtest_2.58.0 hms_1.1.3
[135] ggtreeExtra_1.12.0 ggsignif_0.6.4
[137] stats4_4.3.1 rmutil_1.1.10
[139] evaluate_1.0.3 nloptr_2.1.1
[141] tzdb_0.4.0 getopt_1.20.4
[143] future_1.34.0 clue_0.3-65
[145] coin_1.4-3 broom_1.0.7
[147] xtable_1.8-4 tidytree_0.4.6
[149] rstatix_0.7.2 viridisLite_0.4.2
[151] class_7.3-22 aplot_0.2.5
[153] IRanges_2.36.0 cluster_2.1.6
[155] timechange_0.3.0 globals_0.16.3